Methods and products related to genotyping and DNA analysis

ABSTRACT

The invention encompasses methods and products related to genotyping. The method of genotyping of the invention is based on the use of single nucleotide polymorphisms (SNPs) to perform high throughput genome scans. The high throughput method can be performed by hybridizing SNP allele-specific oligonucleotides and a reduced complexity genome (RCG). The invention also relates to methods of preparing the SNP specific oligonucleotides and RCGs, methods of fingerprinting, determining allele frequency for a SNP, characterizing tumors, generating a genomic classification code for a genome, identifying previously unknown SNPs, and related compositions and kits.

RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional ApplicationNo. 60/101,757, filed Sep. 25, 1998, the entire contents of which ishereby incorporated by reference.

GOVERNMENT SUPPORT

[0002] The present invention was supported in part by a grant from theUnited States National Institutes of Health under contract/grant number5-R01-HG00299-18; the National Cancer Institute of Canada undercontract/grant # 009645;007477; National Research Foundation DHHS, NIH,NCI, 5 F32 CA73118-03 and NIH Predoctoring Grant T32 GM07287. The U.S.Government may retain certain rights in the invention.

FIELD OF THE INVENTION

[0003] The present invention relates to methods and products associatedwith genotyping. In particular, the invention relates to methods ofdetecting single nucleotide polymorphisms and reduced complexity genomesfor use in genotyping methods as well as to various methods ofgenotyping, fingerprinting, and genomic analysis. The invention alsorelates to products and kits, such as panels of single nucleotidepolymorphism allele specific oligonucleotides, reduced complexitygenomes, and databases for use in the methods of the invention.

BACKGROUND OF THE INVENTION

[0004] Genomic DNA varies significantly from individual to individual,except in identical siblings. Many human diseases arise from genomicvariations. The genetic diversity amongst humans and other life formsexplains the heritable variations observed in disease susceptibility.Diseases arising from such genetic variations include Huntington'sdisease, cystic fibrosis, Duchenne muscular dystrophy, and certain formsof breast cancer. Each of these diseases is associated with a singlegene mutation. Diseases such as multiple sclerosis, diabetes,Parkinson's, Alzheimer's disease, and hypertension are much morecomplex. These diseases may be due to polygenic (multiple geneinfluences) or multifactorial (multiple gene and environmentalinfluences) causes. Many of the variations in the genome do not resultin a disease trait. However, as described above, a single mutation canresult in a disease trait.

[0005] The ability to scan the human genome to identify the location ofgenes which underlie or are associated with the pathology of suchdiseases is an enormously powerful tool in medicine and human biology.Several types of sequence variations, including insertions anddeletions, differences in the number of repeated sequences, and singlebase pair differences result in genomic diversity.

[0006] Single base pair differences, referred to as single nucleotidepolymorphisms (SNPS) are the most frequent type of variation in thehuman genome (occurring at approximately 1 in 10² bases). A SNP is agenomic position at which at least two or more alternative nucleotidealleles occur at a relatively high frequency (greater than 1%) in apopulation. SNPs are well-suited for studying sequence variation becausethey are relatively stable (i.e., exhibit low mutation rates) andbecause single nucleotide variations can be responsible for inheritedtraits.

[0007] Polymorphisms identified using microsatellite-based analysis, forexample, have been used for a variety of purposes. Use of geneticlinkage strategies to identify the locations of single Mendelian factorshas been successful in many cases (Benomar et al. (1995), Nat. Genet.,10:84-8; Blanton et al. (1991), Genomics, 11:857-69). Identification ofchromosomal locations of tumor suppressor genes has generally beenaccomplished by studying loss of heterozygosity in human tumors (Caveneeet al. (1983), Nature, 305:779-784; Collins et al. (1996), Proc. Natl.Acad. Sci. USA, 93:14771-14775; Koufos et al. (1984), Nature,309:170-172; and Legius et al. (1993), Nat. Genet., 3:122-126).Additionally, use of genetic markers to infer the chromosomal locationsof genes contributing to complex traits, such as type I diabetes (Daviset al. (1994), Nature, 371:130-136; Todd et al. (1995), Proc. Natl.Acad. Sci. USA, 92:8560-8565), has become a focus of research in humangenetics.

[0008] Although substantial progress has been made in identifying thegenetic basis of many human diseases, current methodologies used todevelop this information are limited by prohibitive costs and theextensive amount of work required to obtain genotype information fromlarge sample populations. These limitations make identification ofcomplex gene mutations contributing to disorders such as diabetesextremely difficult. Techniques for scanning the human genome toidentify the locations of genes involved in disease processes began inthe early 1980s with the use of restriction fragment length polymorphism(RFLP) analysis (Botstein et al. (1980), Am. J. Hum. Genet., 32:314-31;Nakamura et al. (1987), Science, 235:1616-22). RFLP analysis involvessouthern blotting and other techniques. Southern blotting is bothexpensive and time-consuming when performed on large numbers of samples,such as those required to identify a complex genotype associated with aparticular phenotype. Some of these problems were avoided with thedevelopment of polymerase chain reaction (PCR) based microsatellitemarker analysis. Microsatellite markers are simple sequence lengthpolymorphisms (SSLPs) consisting of di-, tri-, and tetra-nucleotiderepeats.

[0009] Other types of genomic analysis are based on use of markers whichhybridize with hypervariable regions of DNA having multiallelicvariation and high heterozygosity. The variable regions which are usefulfor fingerprinting genomic DNA are tandem repeats of a short sequencereferred to as a mini satellite. Polymorphism is due to allelicdifferences in the number of repeats, which can arise as a result ofmitotic or meiotic unequal exchanges or by DNA slippage duringreplication.

[0010] The most commonly used method for genotyping involves Webermarkers, which are abundant interspersed repetitive DNA sequences,generally of the form (dC-dA)_(n) (dG-dT)_(n). Weber markers exhibitlength polymorphisms and are therefore useful for identifyingindividuals in paternity and forensic testing, as well as for mappinggenes involved in genetic diseases. In the Weber method of genotyping,generally 400 Weber or microsatellite markers are used to scan eachgenome using PCR. Using these methods, if 5,000 individual genomes arescanned, 2 million PCR reactions are performed (5,000 genomes×400markers). The number of PCR reactions may be reduced by multiplexing, inwhich, for instance, four different sets of primer are reactedsimultaneously in a single PCR, thus reducing the total number of PCRsfor the example provided to 500,000. The 500,000 PCR mixtures areseparated by polyacrylamide gel electrophoresis (PAGE). If the samplesare run on a 96-lane gel, 5,200 gels must be run to analyze all 500,000PCR reaction mixtures. PCR products can be identified by their positionon the gels, and the differences in length of the products can bedetermined by analyzing the gels. One problem with this type of analysisis that “stuttering” tends to occur, causing a smeared result and makingthe data difficult to interpret and score.

[0011] More recent advances in genotyping are based on automatedtechnologies utilizing DNA chips, such as the Affymetrix HuSNP Chiprmanalysis system. The HuSNP Chip™ is a disposable array of DNA moleculeson a chip (400,000 per half inch square slide). The single stranded DNAmolecules bound to the slide are present in an ordered array ofmolecules having known sequences, some of which are complementary to oneallele of a SNP-containing portion of a genome. If the same 5,000individual genome study described above is performed using theAffymetrix HuSNP Chip™ analysis system, approximately 5,000 gene chipshaving 1,000 or more SNPs per chip would be required. Prior to the chipscan, the genomic DNA samples would be amplified by PCR in a similarmanner to conventional microsatellite genotyping. The gene chip methodis also expensive and time-intensive.

SUMMARY OF THE INVENTION

[0012] The present invention relates to methods and products foridentifying points of genetic diversity in genomes of a broad spectrumof species. In particular, the invention relates to a high throughputmethod of genotyping of SNPs in a genome (e.g. a human genome) usingreduced complexity genomes (RCGs) and, in some exemplary embodiments,using SNP allele specific oligonucleotides (SNP-ASO) and specifichybridization reactions performed, for example, on a surface. The methodof genotyping, in some aspects of the invention, is accomplished byscanning a RCG for the presence or absence of a SNP allele. Using thismethod, tens of thousands of genomes from one species may besimultaneously assayed for the presence or absence of each allele of aSNP. The methods can be automated, and the results can be recorded usinga microarray scanner or other detection/recordation devices.

[0013] The invention encompasses several improvements over prior artmethods. For instance, a genome-wide scan of thousands of individualscan be carried out at a fraction of the cost and time required by manyprior art genotyping methods.

[0014] The invention, in one aspect, is a method for detecting thepresence of a SNP allele in a genomic sample. The method, in one aspect,includes preparing a RCG from a genomic sample and analyzing the RCG forthe presence of the SNP allele. In some aspects, the analysis isperformed using a hybridization reaction involving a SNP allele specificoligonucleotide (SNP-ASO) which is complementary to a given allele ofthe SNP and the RCG. If the allele of the SNP is present in the genomicsample, then the SNP-ASO hybridizes with the RCG.

[0015] In some aspects, the method is a method for determining agenotype of a genome, whereby the genotype is identified by the presenceor absence of alleles of the SNP in the RCG. In other aspects, themethod is a method for characterizing a tumor, wherein the RCG isisolated from a genome obtained from a tumor of a subject and whereinthe tumor is characterized by the presence or absence of an allele ofthe SNP in the RCG.

[0016] In other aspects, the method is a method for determining allelicfrequency for a SNP, and further comprises determining the number ofarbitrarily selected genomes from a population which include each alleleof the SNP in order to determine the allelic frequency of the SNP in thepopulation.

[0017] In some embodiments, the hybridization reaction is performed on asurface and the RCG or the SNP-ASO is immobilized on the surface. In yetother embodiments, the SNP-ASO is hybridized with a plurality of RCGs inindividual reactions.

[0018] In other aspects, the method includes performing a hybridizationreaction involving a RCG and a surface having a SNP-ASO immobilizedthereon, repeating the hybridization with a plurality of RCGs from theplurality of genomes, and determining the genotype based on whether theSNP-ASO hybridizes with at least some of the RCGs.

[0019] The RCG may be a PCR-derived RCG or a native RCG. In someembodiments, the RCG is prepared by performing degenerateoligonucleotide priming-PCR (DOP-PCR) using a degenerate oligonucleotideprimer having a tag-(N)_(x)-TARGET nucleotide sequence, wherein theTARGET nucleotide sequence includes at least 7 TARGET nucleotides andwherein x is an integer from 0 to 9, and wherein N is any nucleotide. Invarious embodiments, the TARGET nucleotide sequence includes 8, 9, 10,11, or 12 nucleotide residues. In other embodiments, x is an integerfrom 3 to 9 (e.g. 6, 7, 8, or 9). Preferably, the method of genotypingis performed to determine genotypes more than one locus. In otherembodiments, the RCG is prepared by performing DOP-PCR using adegenerate oligonucleotide primer having a tag-(N_(a)-TARGET nucleotidesequence, wherein the TARGET nucleotide sequence includes fewer than 7TARGET nucleotide residues and wherein x is an integer from 0 to 9, andwherein N is any nucleotide residue.

[0020] The methods can be performed on a support. Preferably, thesupport is a solid support such as a glass slide, a membrane such as anitrocellulose membrane, etc.

[0021] In yet other embodiments, the RCG is prepared by interspersedrepeat sequence-PCR (IRS-PCR), arbitrarily primed-PCR (AP-PCR),adapter-PCR, or multiple primed DOP-PCR.

[0022] In a preferred embodiment, the methods are useful for determininga genotype associated with or linked to a specific phenotype, and thedistinct isolated genomes or RCGs are associated with a commonphenotype.

[0023] The SNP-ASO used according to the methods of the invention arepolynucleotides including one allele of two possible nucleotides at thepolymorphic site. In one embodiment, the SNP-ASO is composed of fromabout 10 to 50 nucleotides. In a preferred embodiment, the SNP-ASO iscomposed of from about 10 to 25 nucleotides.

[0024] According to one embodiment, the SNP-ASO is labeled. The methodscan, optionally, also include addition of an excess of non-labeledSNP-ASO in which the polymorphic nucleotide residue corresponds to adifferent allele of the SNP and which is added during the hybridizationstep. Additionally, a parallel reaction may be performed wherein thelabeling of the two SNP-ASOs is reversed. The label on the SNP-ASO inone embodiment is a radioactive isotope. In this embodiment, the labeledhybridized products on the surface may be exposed to an X-ray film toproduce a signal on the film which corresponds to the radioactivelylabeled hybridization products. In another embodiment, the SNP-ASO islabeled with a fluorescent molecule. In this embodiment, the labeledhybridized products on the surface may be exposed to an automatedfluorescence reader to generate an output signal which corresponds tothe fluorescently labeled hybridization products.

[0025] According to one embodiment, the RCG is labeled. The label on theRCG in one embodiment is a radioactive isotope. In this embodiment, thelabeled hybridized products on the surface may be exposed to an X-rayfilm to produce a signal on the film which corresponds to theradioactively labeled hybridization products. In another embodiment, theRCG is labeled with a fluorescent molecule. In this embodiment, thelabeled hybridized products on the surface may be exposed to anautomated fluorescence reader to generate an output signal whichcorresponds to the fluorescently labeled hybridization products.

[0026] In one embodiment, a plurality of different SNP-ASOs are attachedto the surface. In another embodiment, the plurality includes at least500 different SNP-ASOs. In yet another embodiment, the pluralityincludes at least 1000.

[0027] In another embodiment, a plurality of SNP-ASOs are labeled withfluorescent molecules, each SNP-ASO being labeled with a spectrallydistinct fluorescent molecule. In various embodiments, the number ofspectrally distinct fluorescent molecules is two, three, four, five,six, seven, or eight.

[0028] In yet another embodiment, the plurality of RCGs are labeled withfluorescent molecules, each RCG being labeled with a spectrally distinctfluorescent molecule. All of the RCGs having a spectrally distinctfluorescent molecule can be hybridized with a single support. In variousembodiments the number of spectrally distinct fluorescent molecules istwo, three, four, five, six, seven, or eight.

[0029] According to other aspects, the invention encompasses methods forcharacterizing a tumor by assessing the loss of heterozygosity,determining allelic frequency for a SNP, generating a genomic patternfor an individual genome, and generating a genomic classification codefor a genome.

[0030] In one aspect, the method for characterizing a tumor includesisolating genomic DNA from tumor samples obtained from a plurality ofsubjects, preparing a plurality of RCGs from the genomic DNA, performinga hybridization reaction involving a SNP-ASO and the plurality of RCGs(e.g. immobilized on a surface), and identifying the presence of a SNPallele in the genomic DNA based on whether the SNP-ASO hybridizes withat least some of the RCGs in order to characterize the tumor. One ormore of the RCGs or one or more of the SNP-ASOs can be immobilized on asurface.

[0031] In another aspect, the invention is a method generating a genomicpattern for an individual genome. The method, in one aspect, includespreparing a plurality of RCGs, analyzing the RCGs for the presence ofone or more SNP alleles, and identifying a genomic pattern of SNPs foreach RCG by determining the presence or absence therein of SNP alleles.In some embodiments, the analysis involves performing a hybridizationreaction involving a panel of SNP-ASOs (e.g. ones which are eachcomplementary to one allele of a SNP), and the plurality of RCGs. Thegenomic pattern can be identified by determining the presence or absenceof a SNP allele for each RCG by detecting whether the SNP-ASOs hybridizewith the RCGs. In one embodiment, a plurality of SNP-ASOs are hybridizedwith the support, and each SNP-ASO of the panel is hybridized with adifferent support than the other SNP-ASO.

[0032] In some embodiments, the genomic pattern is a genomicclassification code which is generated from the pattern of SNP allelesfor each RCG. In other embodiments, the genomic classification code isalso generated from the allelic frequency of the SNPs. In yet otherembodiments, the genomic pattern is a visual pattern. The genomicpattern may be in physical or electronic form.

[0033] In another aspect, the invention includes is a method forgenerating a genomic pattern for an individual genome. The methodincludes identifying a genomic pattern of SNP alleles for each RCG bydetermining the presence or absence therein of selected SNP alleles.

[0034] A method for generating a genomic classification code for agenome is provided in another aspect of the invention. The methodincludes preparing a RCG, analyzing the RCG for the presence of one ormore SNP alleles (e.g. ones of known allelic frequency), identifying agenomic pattern of SNP alleles for the RCG by determining the presenceor absence therein of SNP alleles, and generating a genomicclassification code for the RCG based on the presence or absence (and,optionally, the allelic frequency) of the SNP alleles. In someembodiments, the analysis involves performing a hybridization reactioninvolving the RCG and a panel of SNP-ASOs (e.g. corresponding to SNPalleles of known allelic frequency), each of which is complementary toone allele of a SNP. The genomic pattern is identified based on whethereach SNP-ASO hybridizes with the RCG.

[0035] The method for determining allelic frequency for a SNP, inanother aspect, includes preparing a plurality of RCGs from distinctisolated genomes, performing a hybridization reaction involving one RCGand a surface having a SNP-ASO immobilized thereon, repeating thehybridization with each of the plurality of RCGs, and determining thenumber of RCGs which include each allele of the SNP in order todetermine the allelic frequency of the SNP. In other embodiments theRCGs are immobilized on the surface.

[0036] In another aspect, the method for generating a genomic patternfor an individual genome includes preparing a plurality of RCGs,performing a hybridization reaction involving a RCG and a surface havinga SNP-ASO immobilized thereon, repeating the hybridization step witheach of the plurality of RCGs, and identifying a genomic pattern of SNPsfor each RCG by determining the presence therein of SNPs based onwhether each SNP-ASO hybridizes with each RCG.

[0037] The method for generating a genomic classification code for agenome, in another aspect, includes preparing a RCG, performing ahybridization reaction involving the RCG and a panel of SNP-ASOs (e.g.immobilized on a surface), identifying a genomic pattern of SNPs for theRCG by determining the presence therein of SNPs based on whether eachSNP-ASO hybridizes with the RCG, and generating a genomic classificationcode for the RCG based on the identities of the SNPs which hybridizewith the RCG, the identities of the SNPs which do not hybridize with theRCG, and, optionally, also based on the allelic frequency of the SNPs.In one embodiment, each SNP-ASO of the panel is immobilized on aseparate surface. In another embodiment, more than one SNP-ASO of thepanel is being immobilized on the same surface, each SNP-ASO beingimmobilized on a distinct area of the surface.

[0038] In an embodiment, the genomic classification code is encoded asone or more computer-readable signals on a computer-readable medium Inother aspects of the invention, compositions are provided. According toone aspect, the composition is a plurality of RCGs immobilized on asurface, wherein the RCGs are prepared by a method including the step ofperforming DOP-PCR using a DOP primer having a tag-(N)_(n)— TARGETnucleotide sequence, wherein the TARGET nucleotide sequence includes atleast 7 nucleotide residues, wherein x is an integer from 0 to 9, andwherein N is any nucleotide residue. In various embodiments, the TARGETnucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. Inother embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8 or 9).

[0039] According to another aspect, the composition is a panel ofSNP-ASOs immobilized on a surface, wherein the SNPs are identified by amethod including preparing a set of primers from a RCG, performing PCRusing the set of primers on a plurality of isolated genomes to yield DNAproducts, isolating and, optionally, sequencing the DNA products, andidentifying a SNP based on the sequences of the PCR products. In oneembodiment, the plurality of isolated genomes includes at least fourisolated genomes.

[0040] According to another aspect of the invention, a kit is provided.The kit includes a container housing a set of PCR primers for reducingthe complexity of a genome, and a container housing a set of SNP-ASOs.The SNPs which correspond to the SNP-ASOs of the kit are preferablypresent within a RCG made using the PCR primers of the kit with afrequency of at least 50%.

[0041] In one embodiment, the set of PCR primers are primers forDOP-PCR. Preferably, the degenerate oligonucleotide primer has atag-(N)_(x)-TARGET nucleotide sequence, wherein the TARGET nucleotidesequence includes at least 7 nucleotide residues wherein x is an integerfrom 0 to 9, and wherein N is any nucleotide residue. In variousembodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12nucleotide residues. In other embodiments, x is an integer from 3 to 9(e.g., 6, 7, 8 or 9).

[0042] In yet other embodiments, the RCG is prepared by IRS-PCR, AP-PCR,or adapter-PCR.

[0043] The SNP-ASOs of the invention are polynucleotides including oneof the alternative nucleotides at a polymorphic nucleotide residue of aSNP. In one embodiment, the SNP-ASO is composed of from about 10 to 50nucleotide residues. In a preferred embodiment the SNP-ASO is composedof from about 10 to 25 nucleotide residues. In another embodiment, theSNP-ASOs are labeled with a fluorescent molecule.

[0044] According to yet another aspect of the invention, a compositionis provided. The composition includes a plurality of RCGs immobilized ona surface, wherein the RCGs are composed of a plurality of DNAfragments, each DNA fragment including a tag (N)_(x)-TARGET nucleotide,wherein the TARGET nucleotide sequence is identical in all of the DNAfragments of each RCG, wherein the TARGET nucleotidesequence includes atleast 7 nucleotide residues, wherein x is an integer from 0 to 9, andwherein N is any nucleotide residue. In various embodiments, the TARGETnucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. Inother embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).

[0045] In one aspect, the invention is a method for identifying a SNP.The method includes preparing a set of primers from a RCG, wherein theRCG is composed of a first set of PCR products, PCR-amplifying aplurality of isolated genomes using the set of primers to yield a secondset of PCR products, isolating, and optionally, sequencing the PCRproducts, and identifying a SNP based on the sequences of one or bothsets of PCR products. In one embodiment, the plurality of isolatedgenomes is a pool of genomes. Preferably, the isolated genomes are RCGs.RCGs can be prepared in a variety of ways, but it is preferred, in someaspects, that the RCG is prepared by DOP-PCR.

[0046] In one embodiment, the method of preparing the set of primers isperformed by at least: preparing a RCG, separating the first set of PCRproducts into individual PCR products, determining the nucleotidesequence of each end of at least one of the PCR products, and generatingprimers for use in the subsequent PCR step based on the sequence of theends of the PCR product(s).

[0047] The set of PCR products may be separated by any means known inthe art for separating polynucleotides. In a preferred embodiment, theset of PCR products is separated by gel electrophoresis. Preferably, oneor more libraries are prepared from segments of the gel containingseveral PCR products and clones are isolated from the library, eachclone including a PCR product from the library. In other embodiments,the set of PCR products is separated by high pressure liquidchromatography or column chromatography.

[0048] The RCG used to generate primers or PCR products for identifyingSNPs can be prepared by PCR methods. Preferably, the RCG is prepared byperforming DOP-PCR using a degenerate oligonucleotide primer having atag-(N)_(x)-TARGET nucleotide sequence, wherein the TARGET nucleotidesequence includes at least 7 TARGET nucleotide residues wherein x is aninteger from 0 to 9, and wherein N is any nucleotide residue. In variousembodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12nucleotide residues. In other embodiments, x is an integer from 3-9(e.g. 6, 7, 8, or 9). In other embodiments, the RCG is prepared byperforming DOP-PCR using a degenerate oligonucleotide primer having atag(N)_(x)-TARGET nucleotide sequence, wherein the TARGET nucleotidesequence includes fewer than 7 TARGET nucleotide residues, wherein x isan integer from 0 to 9, and wherein N is any nucleotide residue.

[0049] In yet other embodiments, the RCG is prepared by IRS-PCR, AP-PCR,or adapter-PCR.

[0050] In a preferred embodiment of the invention, the set of primers iscomposed of a plurality of polynucleotides, each polynucleotideincluding a tag (N)_(x)-TARGET nucleotide sequence, wherein TARGET isthe same sequence in each polynucleotide in the set of primers. Thesequence of (N)_(x) is different in each primer within a set of primers.In some embodiments, the set of primers includes at least 4³, 4⁴, 4⁵,4⁶, 4⁷, 4⁸, or 49 different primers in the set.

[0051] In another aspect, the invention is a method for generating a RCGusing DOP-PCR. The method includes the step of performing degenerateDOP-PCR using a degenerate oligonucleotide primer having an (N)_(n)—TARGET nucleotide sequence, wherein the TARGET nucleotide sequenceincludes at least 7 TARGET nucleotide residues and wherein x is aninteger from 0 to 9, and wherein N is any nucleotide residue. In variousembodiments the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12nucleotide residues. In other embodiments, x is an integer from 3 to 9(e.g. 6, 7, 8, or 9).

[0052] According to one embodiment, the tag includes 6 nucleotideresidues. Preferably the RCG is used in a genotyping procedure. In otherembodiments, the RCG is analyzed to detect a polymorphism. The analysisstep may be performed using mass spectroscopy.

[0053] In another aspect the invention is a method for assessing whethera subject is at risk for developing a disease. The method includes thesteps of using the methods of the invention identify a plurality of SNPsthat occur in at least, for example 10% of genomes obtained fromindividuals afflicted with the disease and determining whether one ormore of those SNPs occurs in the subject. In the method the affectedindividuals are compared with the unaffected individuals. Importantinformation can be generated from the observation that there is adifference between affected and unaffected individuals alone.

[0054] In other aspects the invention is a method for identifying a setof one or more SNPs associated with a disease or disease risk. Themethod includes the steps of preparing individual RCGs obtained fromsubjects afflicted with a disease, using the same set of primers toprepare each RCG, and comparing the SNP allele frequency identified inthose RCGs with the same genetic SNP allele frequency in normal (i.e.,non-afflicted) subjects to identify SNP associated with the disease. Inother aspects the invention is a method for identifying a set of SNPsrandomly distributed throughout the genome. The set of SNPs is used as apanel of genetic markers to perform a genome-wide scan for linkageanalysis.

[0055] In an embodiment, a computer-readable medium havingcomputer-readable signals stored thereon is provided. The signals definea data structure that one or more data components. Each data componentincludes a first data element defining a genomic classification codethat identifies a corresponding genome. Each genomic classification codeclassifies the corresponding genome based one or more single nucleotidepolymorphisms of the corresponding genome.

[0056] In an optional aspect of this embodiment, the genomicclassification code is a unique identifier of the corresponding genome.

[0057] In an optional aspect of this embodiment, the genomicclassification code is based on a pattern of the single nucleotidepolymorphisms of the corresponding genome, where the pattern indicatesthe presence or absence of each single nucleotide polymorphism.

[0058] In another optional aspect of this embodiment, each datacomponent also includes one or more data elements, each data elementdefining an attributes of the corresponding genome.

[0059] Each of the embodiments of the invention can encompass variousrecitations made herein. It is, therefore, anticipated that each of therecitations of the invention involving any one element or combinationsof elements can, optionally, be included in each aspect of theinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0060]FIG. 1 is a schematic flow chart depicting a method according tothe invention for identifying SNPs.

[0061]FIG. 2 shows data depicting the process of identifying a SNP: (a)depicts a gel in which inter-Alu PCR genomic DNA products prepared fromthe 8C primer (which has the nucleotide sequence SEQ ID NO:3) wereseparated; (b) depicts a gel in which inserts from the library cloneswere separated; and (c) depicts a filter having two positive or matchedclones.

[0062]FIG. 3 depicts the results of a genotyping and mapping experiment:(a) depicts hybridization results obtained using G allele ASO; (b)depicts hybridization results obtained using A allele ASO; (c) is apedigree of CEPH family #884 with genotypes indicted from (a) and (b);and (d) is a map of chromosome 31q21-23.

[0063]FIG. 4 is a schematic flow chart depicting a method according tothe invention for detecting SNPs.

[0064]FIG. 5 is a block diagram of a computer system for storing andmanipulating genomic information.

[0065]FIG. 6A is an example of a record for storing information about agenome and/or genes or SNPs within the genome.

[0066]FIG. 6B is an example of a record for storing genomic information.

[0067]FIG. 6C is an example of a record for storing information aboutgenes or SNPs within a genome.

[0068]FIG. 7 is a flow chart of a method for determining whether genomicinformation of a sample genome such as SNPs match that of anothergenome.

[0069]FIG. 8 depicts results obtained from a hybridization reactioninvolving RCGs prepared by DOP-PCR and SNP-ASOs immobilized on a surfacein a microarray format.

BRIEF DESCRIPTION OF THE SEQUENCES

[0070] SEQ. ID. NO. 1 is CAGNNNCTG SEQ. ID. NO. 2 is TTTTTTTTTTCAG SEQ.ID. NO. 3 is CTT GCA GTG AGC CGA GATC SEQ. ID. NO. 4 isCTCGAGNNNNNNAAGCGATG SEQ ID NO. 5-697 are nucleotide sequencescontaining SNPs.

DETAILED DESCRIPTION OF THE INVENTION

[0071] The invention relates in some aspects to genotyping methodsinvolving detection of one or more single nucleotide polymorphisms(SNPs) in a reduced complexity genome (RCG) prepared from the genome ofa subject. The invention includes methods of identifying SNPs associatedwith a disease or with pre-disposition to a disease. The inventionfurther includes methods of screening RCGs prepared from one or moresubjects in a population. Such screening can be used, for example, todetermine whether the subject is afflicted with, or is likely to becomeafflicted with, a disorder, to determine allelic frequencies in thepopulation, or to determine degrees of interrelation among subjects inthe population. Additional aspects and details of the compositions,kits, and methods of the invention are described in the followingsections.

[0072] The invention involves several discoveries which have led to newadvances in the field of genotyping. The invention is based on thedevelopment of high throughput methods for analyzing genomic diversity.The methods combine use of SNPs, methods for reducing the complexity ofgenomes, and high throughput screening methods. As discussed in thebackground of the invention, many prior art methods for genotyping arebased on use of hypervariable markers such as Weber markers, whichpredominantly detect differences in numbers of repeats. Use of a highthroughput SNP analysis method is advantageous in view of the Webermarker system for several reasons. For instance, the results of a Weberanalysis system are displayed in the form of a gel, which is difficultto read and must be scored by a professional. The high throughput SNPanalysis method of the invention provides a binary result whichindicates the presence or absence of the SNP in the sample genome.Additionally, the method of the invention requires significantly lesswork and is considerably less expensive to perform. As described in thebackground of the invention, the Weber system requires the performanceof 500,000 PCR reactions and use of 5,200 gels to analyze 5,000 genomes.The same study performed using the methods of the invention could beperformed without using gels. Additionally, SNPs are notspecies-specific and therefore the methods of the invention can beperformed on diverse species and are not limited to humans. It is moretedious to perform inter-species analysis using Weber markers than usingthe methods of the invention.

[0073] Some prior art methods do use SNPs for genotyping but the highthroughput method of the invention has advantages over these methods aswell. Affymetrix utilizes a HuSNP Chip™ system having an ordered arrayof SNPs immobilized on a surface for analyzing nucleic acids. Thissystem is, however, prohibitively expensive for performing large studiessuch as the 5,000 genome study described above.

[0074] The invention is useful for identifying polymorphisms within agenome. Another use for the invention involves identification ofpolymorphisms associated with a plurality of distinct genomes. Thedistinct genomes may be isolated from populations which are related bysome phenotypic characteristic, familial origin, physical proximity,race, class, etc. In other cases, the genomes are selected at randomfrom populations such that they have no relation to one another otherthan being selected from the same population. In one preferredembodiment, the method is performed to determine the genotype (e.g. SNPcontent) of subjects having a specific phenotypic characteristic, suchas a genetic disease or other trait. Other uses for the methods of theinvention involve identification or characterization of a subject, suchas in paternity and maternity testing, immigration and inheritancedisputes, breeding tests in animals, zygosity testing in twins, testsfor inbreeding in humans and animals, evaluation of transplantsuitability, such as with bone marrow transplants, identification ofhuman and animal remains, quality control of cultured cells, andforensic testing such as forensic analysis of semen samples, bloodstains, and other biological materials. The methods of the invention mayalso be used to characterize the genetic makeup of a tumor by testingfor loss of heterozygosity or to determine the allelic frequency of aparticular SNP. Additionally, the methods may be used to generate agenomic classification code for a genome by identifying the presence orabsence of each of a panel of SNPs in the genome and to determine theallelic frequency of the SNPs. Each of these uses is discussed in moredetail herein.

[0075] The genotyping methods of the invention are based on use of RCGsthat can be reproducibly produced. These RCGs are used to identify SNPs,and can be screened individually for the presence or absence of the SNPalleles.

[0076] The invention, in some aspects, is based on the finding that thecomplexity of the genome can be reduced using various PCR and othergenome complexity reduction methods and that RCG's made using suchmethods can be scanned for the presence of SNPs. One problem with usingSNP-ASOs to screen a whole genome (i.e. a genome, the complexity ofwhich has not been reduced) is that the signal to noise (S/N) ratio ishigh due to the high complexity of the genome and relative frequency ofoccurrence of a particular SNP-specific sequence within the wholegenome. When an entire genome of a complex organism is used as thetarget for allele-specific oligonucleotide hybridization, the targetsequence (e.g. about 17 nucleotide residues) to be detected representsonly e.g. approximately 10⁸-10⁹ 1 part in 10⁸ of the DNA sample (e.g.for a NP-ASO about 17 nucleotides). It has been discovered, according tothe invention, that the complexity of the genome can be reduced in areproducible manner and that the resulting RCG is useful for identifyingthe presence of SNPs in the whole genome and for genotyping methods.Reduction in complexity allows genotyping of multiple SNPs followingperformance of a single PCR reaction, reducing the number ofexperimental manipulations that must be performed. The RCG is a reliablerepresentation of a specific subfraction of the whole genome, and can beanalyzed as though it were a genome of considerably lower complexity.

[0077] RCGs are prepared from isolated genomes. An “isolated genome” asused herein is genomic DNA that is isolated from a subject and mayinclude the entire genomic DNA. For instance, an isolated genome may bea RCG, or it may be an entire genomic DNA sample. Genomic DNA is apopulation of DNA that comprises the entire genetic component of aspecies excluding, where applicable, mitochondrial and chloroplast DNA.Of course, the methods of the invention can be used to analyzemitochondrial, chloroplast, etc., DNA as well. Depending on theparticular species of the subject, the genomic DNA can vary incomplexity. For instance, species which are relatively low on theevolutionary scale, such as bacteria, can have genomic DNA which issignificantly less complex than species higher on the evolutionaryscale. Bacteria such as E. coli have approximately 2.4×10⁹ grams permole of haploid genome, and bacterial genomes having a size of less thanabout 5 million base pairs (5 megabases) are known. Genomes ofintermediate complexity, such as those of plants, for instance, rice,have a genome size of approximately 700-1,000 megabases. Genomes ofhighest complexity, such as maize or humans, have a genome size ofapproximately 10-10. Humans have approximately 7.4×10¹² grams per moleof haploid genome.

[0078] A “subject” as used herein refers to any type of DNA-containingorganism, and includes, for example, bacteria, viruses, fungi, animals,including vertebrates and invertebrates, and plants.

[0079] A “RCG” as used herein is a reproducible fraction of an isolatedgenome which is composed of a plurality of DNA fragments. The RCG can becomposed of random or nonrandom segments or arbitrary or non-arbitrarysegments. The term “reproducible fraction” refers to a portion of thegenome which encompasses less than the entire native genome. If areproducible fraction is produced twice or more using the sameexperimental conditions the fractions produced in each repetitioninclude at least 50% of the same sequences. In some embodiments thefractions include at least 70%, 80%, 90%, 95%, 97%, or 99% of the samesequences, depending on how the fractions are produced. For instance, ifa RCG is produced by PCR another RCG can be generated under identicalexperimental conditions having at a minimum greater than 90% of thesequences in the first RCG. Other methods for preparing a RCG such assize selection are still considered to be reproducible but often produceless than 99% of the same sequences.

[0080] A “plurality” of elements, as used throughout the applicationrefers to 2 or more of the element. A “DNA fragment” is a polynucleotidesequence obtained from a genome at any point along the genome andencompassing any sequence of nucleotides. The DNA fragments of theinvention can be generated according to any one of two types mechanisms,and thus there are two types of RCGs, PCR-generated RCGs and nativeRCGs.

[0081] PCR-generated RCGs are randomly primed. That is, each of thepolynucleotide fragments in the PCR-generated RCG all have commonsequences at or near the 5′ and 3′ end of the fragment (When a tag isused in the primer, all of the 5′ and 3′ ends are identical. When a tagis not used the 5′ and 3‘ends have a series of N’s followed by theTARGET sequence (reading in a 5′ to 3′ direction). The TARGET sequenceis identical in each primer, with the exception of multiple-primedDOP-PCR) but the remaining nucleotides within the fragments do not haveany sequence relation to one another. Thus, each polynucleotide fragmentin a RCG includes a common 5′ and 3′ sequence which is determined by theconstant region of the primer used to generate the RCG. For instance, ifthe RCG is generated using DOP-PCR (described in more detail below) eachpolynucleotide fragment would have near the 5′ or 3′ end nucleotidesthat are determined by the “TARGET nucleotide sequence”. The TARGETnucleotide sequence is a sequence which is selected arbitrarily butwhich is constant within a set or subset (e.g. multiple primed DOP-PCR)of primers. Thus, each polynucleotide fragment can have the samenucleotide sequence near the 5′ and 3′ end arising from the same TARGETnucleotide sequence. In some cases more than one primer can be used togenerate the RCG. When more than one primer is used, each member of theRCG would have a 5′ and 3′ end in common with at least one other memberof the RCG and, more preferably, each member of the RCG would have a 5′and 3′ end in common with at least 5% of the other members of the RCG.For example, if a RCG is prepared using DOP-PCR with 2 different primershaving different TARGET nucleotide sequences, a population containing offour sets of PCR products having common ends could be generated. One setof PCR products could be generated having the TARGET nucleotide sequenceof the first primer at or near both the 5′ and 3′ ends and another setcould be generated having the TARGET nucleotide sequence of the secondprimer at or near both the 5′ and 3′ ends. Another set of PCR productscould be generated having the TARGET nucleotide sequence of the secondprimer at or near the 5′ end and the TARGET nucleotide sequence of thefirst primer at or near the 3′ end. A fourth set of PCR products couldbe generated having the TARGET nucleotide sequence of the second primerat or near the 3′ end and the TARGET nucleotide sequence of the firstprimer at or near the 5′ end. The PCR generated genomes are composed ofsynthetic DNA fragments.

[0082] The DNA fragments of the native RCGs have arbitrary sequences.That is, each of the polynucleotide fragments in the native RCG do nothave necessarily any sequence relation to another fragment of the sameRCG. These sequences are selected based on other properties, such assize or, secondary characteristics. These sequences are referred to asnative RCGs because they are prepared from native nucleic acidpreparations rather than being synthesized. Thus they arenative-non-synthetic DNA fragments. The fragments of the native RCG mayshare some sequence relation to one another (e.g. if produced byrestriction enzymes). In some embodiments they do not share any sequencerelation to one another.

[0083] In some preferred embodiments, the RCG includes a plurality ofDNA fragments ranging in size from approximately 200 to 2,000 nucleotideresidues. In a preferred embodiment, a RCG includes from 95 to 0.05% ofthe intact native genome. The fraction of the isolated genome which ispresent in the RCG of the invention represents at most 90% of theisolated genome, and in preferred embodiments, contains less than 50%,40%, 30%, 20%, 10%, 5%, or 1% of the genome. A RCG preferably includesbetween 0.05 and 1% of the intact native genome. In a preferredembodiment, the RCG encompasses 10% or less of an intact native genomeof a complex organism.

[0084] Genomic DNA can be isolated from a tissue sample, a wholeorganism, or a sample of cells. Additionally, the isolated genomes ofthe invention are preferably substantially free of proteins thatinterfere with PCR or hybridization processes, and are alsosubstantially free of proteins that damage DNA, such as nucleases.Preferably, the isolated genomes are also free of non-protein inhibitorsof polymerase function (e.g. heavy metals) and non-protein inhibitors ofhybridization when the PCR-generated RCGs are formed. Proteins may beremoved from the isolated genomes by many methods known in the art. Forinstance, proteins may be removed using a protease, such as proteinase Kor pronase, by using a strong detergent such as sodium dodecyl sulfate(SDS) or sodium lauryl sarcosinate (SLS) to lyse the cells from whichthe isolated genomes are obtained, or both. Lysed cells may be extractedwith phenol and chloroform to produce an aqueous phase containingnucleic acid, including the isolated genomes, which can be precipitatedwith ethanol.

[0085] Several methods can be used to generate PCR-generated RCGincluding IRS-PCR, AP-PCR, DOP-PCR, multiple primed PCR, andadaptor-PCR. Hybridization conditions for particular PCR methods areselected in the context of the primer type and primer length to produceto yield a set of DNA fragments which is a percentage of the genome, asdefined above. PCR methods have been described in many references, seee.g., U.S. Pat. Nos. 5,104,792; 5,106,727; 5,043,272; 5,487,985;5,597,694; 5,731,171; 5,599,674; and 5,789,168. Basic PCR methods havebeen described in e.g., Saiki et al., Science, 230: 1350 (1985) and U.S.Pat. Nos. 4,683,195, 4,683,202 (both issued Jul. 18, 1987) and U.S. Pat.No. 4,800,159 (issued Jan. 24, 1989).

[0086] The PCR methods described herein are performed according to PCRmethods well-known in the art. For instance, U.S. Pat. No. 5,333,675,issued to Mullis et al. describes an apparatus and method for performingautomated PCR. In general, performance of a PCR method results inamplification of a selected region of DNA by providing two DNA primers,each of which is complementary to a portion of one strand within theselected region of DNA. The primer is hybridized to a template strand ofnucleic acid in the presence of deoxyribonucleotide triphosphates (dATP,dCTP, dGTP, and dTTP) and a chain extender enzyme, such as DNApolymerase. The primers are hybridized with the separated strands,forming DNA molecules that are single stranded except for the regionhybridized with the primer, where they are double stranded. The doublestranded regions are extended by the action of the chain extender enzyme(e.g. DNA polymerase) to form an extended double stranded moleculebetween the original two primers. The double stranded DNA molecules areseparated to produce single strands which can then be re-hybridized withthe primers. The process is repeated for a number of cycles to generatea series of DNA strands having the same nucleotide sequence between andincluding the primers.

[0087] Chain extender enzymes are well known in the art and include, forexample, E. coli DNA polymerase I, klenow fragment of E. coli DNApolymerase I, T4 DNA polymerase, T7 DNA polymerase, recombinant modifiedT7 DNA polymerase, reverse transcriptase, and other enzymes. Heat stableenzymes are particularly preferred as they are useful in automatedthermal cycle equipment. Heat stable polymerases include, for example,DNA polymerases isolated from bacillus stearothermophilus (Bio-Rad),thermus thermophilous (finzyme, ATCC number 27634), thermus species(ATCC number 31674), thermus aquaticus strain TV11518 (ATCC number25105), sulfolobus acidocaldarius, described by Bukhrashuili et al.,Biochem. Biophys. Acta., 1008:102-07 (1909), thermus filiformus (ATCCnumber 43280), Taq DNA polymerase, commercially available fromPerkin-Elmer-Cetus (Norwalk, Conn.), Promega (Madison, Wis.) andStratagene (La Jolla, Calif.), and AmpliTaq™ DNA polymerase, arecombinant thermus equitus Taq DNA polymerase, available fromPerkin-Elmer-Cetus and described in U.S. Pat. No. 4,889,818.

[0088] Preferably, the PCR-based RCG generation methods performedaccording to the invention are automated and performed using thermalcyclers. Many types of thermal cyclers are well-known in the art. Forinstance, M.J. Research (Watertown, Mass.) provides a thermal cyclerhaving a peltier heat pump to provide precise uniform temperaturecontrol in the thermal cyclers; DeltaCycler thermal cyclers from Ericomp(San Diego, Calif.) also are peltier-based and include automatic rampingcontrol, time/temperature extension programming and a choice of tube ormicroplate configurations. The RoboCycler™ by Stratagene (La Jolla,Calif.) incorporates robotics to produce rapid temperature transitionsduring cycling and well-to-well uniformity between samples; and aparticularly preferred cycler, is the Perkin-Elmer Applied Biosystems(Foster City, Calif.) ABI Prism™ 877 Integrated Thermal cycler, which isoperated through a programmable interface that automates liquid handlingand thermocycling processes for fluorescent DNA sequencing and PCRreactions. The Perkin-Elmer Applied Biosystems machine is designedspecifically for high-throughput genotyping projects and fully automatesgenotyping steps, including PCR product pooling.

[0089] Degenerate oligonucleotide primed-PCR (DOP-PCR) involves use of asingle primer set, wherein each primer of the set is typically composedof 3 parts. A DOP-PCR primer as used herein can have the followingstructure:

[0090] 5′tag-(N)_(x)-TARGET 3′

[0091] The “TARGET” nucleotide sequence includes at least 5 arbitrarilyselected nucleotide residues that are the same for each primer of theset. x is an integer from 0 to 9, and N is any nucleotide residue. Thevalue of x is preferably the same for each primer of a DOP-PCR primersety. In other embodiments, the TARGET nucleotide sequence includes atleast 6 or 7 and preferably at least 8, 9, or 10 arbitrarily-selectednucleotides. The tag is optional.

[0092] A “TARGET nucleotide” can be used herein is selected arbitrarily.A set of primers is used to generate a particular RCG. Each primer inthe set includes the same TARGET nucleotide sequence as the otherprimers. Of course, sets of primers having different TARGET sequencescan be combined.

[0093] The “tag”, as used herein, is a sequence which is useful forprocessing the RCG but not necessary. The tag, unlike the othersequences in the primer, does not necessarily hybridize with genomic DNAduring the initial round of genomic PCR amplification. In lateramplification rounds, the tag hybridizes with PCR, amplified DNA. Thus,the tag does not contribute to the sequence initially recognized by theprimer. Since the tag does not participate in the initial hybridizationreaction with genomic DNA, but is involved in the primer extensionprocess, the PCR products that are formed (i.e., the reproducible DNAfragments) include the tag sequence. Thus, the end products are DNAfragments that have a sequence identical to a sequence found in thegenome except for the tag sequence. The tag is useful because in laterrounds of PCR it allows use of a higher annealing temperature than couldotherwise be used with shorter oligonucleotides. The arbitrarilyselected sequence is positioned at the 3′ end of the primer. Thissequence, although arbitrarily selected, is the same for each primer ina set of DOP-PCR primers. From 0 to 9 nucleotide residues (“N” in theformula above) are located at the 5′-end of the TARGET sequence in theDOP-PCR primers of the invention. Each of these residues can beindependently selected from naturally-occurring or artificial nucleotideresidues. By way of example, each “N” residue can be an inosine ormethylcytosine residue. In the formula, “x” is an integer that can befrom 0 to 9, and is preferably from 3 to 9 (e.g. 3, 4, 5, 6, 7, 8, or9). Each set of DOP-PCR primers of the invention can thus contain up to4^(x) unique primers (i.e., 1, 4, 16, 64 . . . , 262144 primers for x=0,1, 2, 3, . . . , 9). Finally, a base pair tag can be positioned at the5′ end of the primer. This tag can optionally include a restrictionenzyme site. In general, inclusion of a tag sequence in the DOP-PCRprimers of the invention is preferred, but not necessary.

[0094] The initial rounds of DOP-PCR are preferably performed at a lowtemperature given that the specificity of the reaction will bedetermined by only the 3′ TARGET nucleotide sequence. A slow ramp timeduring these cycles ensures that the primers do not detach from thetemplate before being extended. Subsequent rounds are carried out at ahigher annealing temperature because in the subsequent rounds the 5′ endof the DOP-PCR primer (the tag) is able to contribute to the primerannealing. A PCR cycle performed under low stringency hybridizationconditions generally is from about 35° C. to about 55° C.

[0095] Because DOP-PCR involves a randomly chosen sequence, theresultant PCR products are generated from genome sequences arbitrarilydistributed throughout the genome and will generally not be clusteredwithin specific sites of the genome. Additionally, creation of new setsof DOP-PCR-amplified DNA fragments can be easily accomplished bychanging the sequence, length, or both, of the primer. RCGs havinggreater or lesser complexity can be generated by selecting DOP-PCRprimers having shorter or longer, respectively, TARGET and (N)_(x)nucleotide sequences. This approach can also be used with multipleDOP-PCR primers such as in the “multiple-primed DOP-PCR” method(described below). Finally, use of arbitrarily chosen sequences ofDOP-PCR is useful in many species because the arbitrarily-selectedsequences are not species-specific, as with some forms of PCR whichrequire use of a specific known sequence.

[0096] Another method for generating a PCR-generated RCG involvesinterspersed repeat sequence PCR (IRS-PCR). Mammalian chromosomesinclude both repeated and unique sequences. Some of the repeatedsequences are short interspersed repeated sequences (IRS's) and othersare long IRS's. One major family of short IRS's found in humans includesAlu repeat sequences. Amplification using a single Alu primer willoccurs whenever two Alu elements lie in inverted orientation to eachother on opposite strands. There are believed to be approximately900,000 Alu repeats in a human haploid genome. Another type of IRSsequence is the L1 element (most common is LlHs) which is present in10⁴-10⁵ copies in a human genome. Because the L1 sequence is expressedless abundantly in the genome than the Alu sequence, fewer amplificationproducts are produced upon amplification using an L1 primer. In IRS-PCR,a primer which has homology to a repetitive sequence present on oppositestrands within the genome of the species to be analyzed is used. Whentwo repeat elements having the primer sequence are present in ahead-to-head fashion within a limited distance (approximately 2000nucleotide residues), the inter-repeat sequence can be amplified. Themethod has the advantage that the complexity of the resulting PCRproducts can be controlled by how homologous the primer chosen is withthe repeat consensus (that is, the more homologous the primer is withthe repeat consensus sequence, the more complex the PCR product willbe).

[0097] In general, an IRS-PCR primer has a sequence wherein at least aportion of the primer is homologous with (e.g. 50%, 75%, 90%, 95% ormore identical to) the consensus nucleotide sequence of an IRS of thesubject.

[0098] In mammalian genomes, small interspersed repeat sequences (SINES)are present in extremely high copy number and are often configured suchthat a single copy sequence of between 500 nucleotide residues and 1000nucleotide residues is situated between two repeats which are orientedin a head-to-head or tail-to-tail manner. Genomic DNA sequences havingthis configuration are substrates for Alu PCR in human DNA and B1 and B2PCR in the mouse. The precise number of products which are representedin a specific Alu, B1, or B2 PCR reaction depends on the choice ofprimer used for the reaction. This variation in product complexity isdue to the variation in sequence among the large number ofrepresentative sequences of the IRS family in each species. A detailedstudy of this variation was described by Britten (Britten, R. J. (1994),Proc. Natl. Acad. Sci. USA, 91:5992-5996). In the Britten study, thesequence variation for each nucleotide residue of the Alu consensussequence was analyzed for 1574 human Alu sequences. The complexity ofAlu PCR products generated by amplification using a given Alu PCR primercan be predicted to a significant extent based on the degree to whichthe nucleotide sequence of the primer matches consensus nucleotidesequences. As a general rule, Alu PCR products become progressively lesscomplex as the primer sequence diverges from the Alu consensus. Becausetwo hybridized primers are required at each site for which Alu PCR is tobe accomplished, it is predictable that linear variation and the numberof genomic sites to which a primer may bind will be reflected in thecomplexity of PCR products, which is roughly proportional to the squareof primer binding efficiency. This prediction conforms to experimentalresults, permitting synthesis of Alu PCR products having a wide range ofproduct complexity values. Therefore, when it is desirable to reduce thenumber of PCR products obtained using Alu PCR, the primer sequenceshould be designed to diverge by a predictable amount from the Aluconsensus sequence.

[0099] Another method for generating a RCG involves arbitrarily primedPCR (AP-PCR). AP-PCR utilizes short oligonucleotides as PCR primers toamplify a discrete subset of portions of a high complexity genome. ForAP-PCR, the primer sequence is arbitrary and is selected withoutknowledge of the sequence of the target nucleic acids to be amplified.The arbitrary primer is generally 50-60% G+C. The AP-PCR method issimilar to the DOP-PCR method described above, except that the AP-PCRprimer consists of only the arbitrarily-selected nucleotides and not the5′ flanking degenerate residues or the tag (i.e. N_(x) residue describedfor the DOP-PCR primers). The genome may be primed using a singlearbitrary primer or a combination of two or more arbitrary primers, eachhaving a different, but optionally related, sequence.

[0100] AP-PCR is performed under low stringency hybridizationconditions, allowing hybridization of the primer with targets with whichthe primer can exhibit a substantial degree of mismatching. A PCR cycleperformed under low stringency hybridization conditions generally isfrom about 35° C. to about 55° C. Mismatches refer to non complementarynucleotide bases in the primer, relative to the template with which itis hybridized.

[0101] AP-PCR methods have been used previously in combination with gelelectrophoresis to determine genotypes. AP-PCR products aregenerationally fractionated on a high resolution polyacrylamide gel, andthe presence or absence of specific bands is used to genotype a specificlocus. In general, the difference between the presence and absence of aband is a consequence of a single nucleotide DNA sequence difference inone of the primer binding sites for a given single copy sequence.

[0102] The product complexity obtained using a given primer or primerset can be determined by several methods. For instance, the productcomplexity can be determined using PCR amplification of a panel of humanyeast artificial chromosome (YAC) DNA samples from a CEPH 1 library.These YACs each carry a human DNA segment approximately 300-400 kilobasepairs in length. Product complexity for each primer set can be inferredby comparing the number of bands produced per YAC when analyzed onagarose gel with an IRS-PCR product of known complexity. Additionally,for products of relatively low complexity, electrophoresis onpolyacrylamide gels can establish the product complexity, compared to astandard. Alternatively, an effective way to estimate the complexity ofthe product is to carry out a reannealing reaction using resistance toS1 nuclease-catalyzed degradation to determine the rate of reannealingof internally labeled, denatured, double-stranded DNA product.Comparison with reannealing rates of standards of known complexitypermits accurate estimation of product complexity. Each of these threemethods may be used for IRS PCR. The second and third methods are bestfor AP-PCR and DOP-PCR which, unlike IRS-PCR, will not selectivelyamplify human DNA from a crude YAC DNA preparation.

[0103] The complexity of PCR products generated by AP-PCR can beregulated by selecting the primer sequence length, the number of primersin a primer set, or some combination of these. By choosing theappropriate combination, AP-PCR may also be used to reduce thecomplexity of a genome for SNP identification and genotyping, asdescribed herein. AP-PCR markers are different from Alu PCR primers,have a different genomic distribution, and can therefore complement anIRS-PCR genome complexity-reducing method. The methods can be used incombination to produce complementary information from genome scans.

[0104] One PCR method for preparing RCGs is an adapter-linkeramplification PCR method (previously described in e.g., Saunders et al.,Nuc. Acids Res., 17 9027 (1990); Johnson, Genomics, 6: 243 (1990) andPCT Application WO90/00434, published Aug. 9, 1990. In this method,genomic DNA is digested using a restriction enzyme, and a set of linkersis ligated onto the ends of the resulting DNA fragments. PCRamplification of genomic DNA is accomplished using a primer which canbind with the adapter linker sequence. Two possible variations of thisprocedure which can be used to limit genome complexity are (a) to use arestriction enzyme which produces a set of fragments which vary inlength such that only a subset (e.g. those smaller than aPCR-amplifiable length) are amplified; and (b) to digest the genomic DNAusing a restriction enzyme that produces an overhang of randomnucleotide sequence (e.g., AlwN1 recognizes CAGNNNCTG; SEQ ID NO: 1) andcleaves between NNN and CTG). Adapters are constructed to anneal withonly a subset of the products. For example, in the case of AlwN1,adapters having a specific 3 nucleotide residue overhang (correspondingto the random 3 base pair sequence produced by the restriction enzymedigestion) would be used to yield (43) 64-fold reduction in complexity.Fragments which have an overhang sequence complementary to the adapteroverhang are the only ones which are amplified.

[0105] Another method for generating RCGs is based on the development ofnative RCGs. Several methods can be used to generate native RCGs,including DNA fragment size selection, isolating a fraction of DNA froma sample which has been denatured and reannealed, pH-separation,separation based on secondary structure, etc.

[0106] Size selection can be used to generate a RCG by separatingpolynucleotides in a genome into different fractions wherein eachfraction contains polynucleotides of an approximately equal size. One ormore fractions can be selected and used as the RCG. The number offractions selected will depend on the method used to fragment the genomeand to fractionate the pieces of the genome, as well as the total numberof fractions. In order to increase the complexity of the RCG, morefractions are selected. One method of generating a RCG involvesfragmenting a genome into arbitrarily size pieces and separating thepieces on a gel (or by HPLC or another size fractionation method). Aportion of the gel is excised, and DNA fragments contained in theportion are isolated. Typically, restriction enzymes can be used toproduce DNA fragments in a reproducible manner.

[0107] Separation based on secondary structure can be accomplished in amanner similar to size selection. Different fractions of a genome havingsecondary structure can be separated on a gel. One or more fractions areexcised from the gel, and DNA fragments are isolated therefrom.

[0108] Another method for creating a native RCG involves isolating afraction of DNA from a sample which has been denatured and reannealed. Agenomic DNA sample is denatured, and denatured nucleic acid moleculesare allowed to reanneal under selected conditions. Some conditions allowmore of the DNA to be reannealed than other conditions. These conditionsare well known to those of ordinary skill in the art. Either thereannealed or the remaining denatured fractions can be isolated. It isdesirable to select the smaller of these two fractions in order togenerate RCG. The reannealing conditions used in the particular reactiondetermine which fraction is the smaller fraction. Variations of thismethod can also be used to generate RCGs. For instance, once a portionof the fraction is allowed to reanneal, the double stranded DNA may beremoved (e.g., using column chromatography), the remaining DNA can thenbe allowed to partially reanneal, and the reannealed fraction can beisolated and used. This variation is particularly useful for removingrepetitive elements of the DNA, which rapidly reanneal.

[0109] The amount of isolated genome used in the method of preparingRCGs will vary, depending on the complexity of the initial isolatedgenome. Genomes of low complexity, such as bacterial genomes having asize of less than about 5 million base pairs (5 megabases), usually areused in an amount from approximately 10 picograms to about 250nanograms. A more preferred range is from 30 picograms to about 7.5nanograms, and even more preferably, about 1 nanogram. Genomes ofintermediate complexity, such as plants (for instance, rice, having agenome size of approximately 700-1,000 megabases) can be used in a rangeof from approximately 0.5 nanograms to 250 nanograms. More preferably,the amount is between 1 nanogram and 50 nanograms. Genomes of highestcomplexity (such as maize or humans, having a genome size ofapproximately 3,000 megabases) can be used in an amount fromapproximately 1 nanogram to 250 nanograms (e.g. for PCR).

[0110] In addition to the DOP-PCR methods described above, PCR-generatedRCGs can be prepared using DOP-PCR involving multiple primers, which isreferred to herein as “multiple-primed-DOP-PCR”. Multiple-primed-DOP-PCRinvolves the use of at least two primers which are arranged similarly tothe single primers discussed above and are typically composed of 3parts. A multiple-primed-DOP-PCR primer as used herein has the followingstructure:

[0111] tag-(N)_(x)-TARGET₂

[0112] The TARGET₂ nucleotide sequence includes at least 5, andpreferably at least 6, TARGET nucleotide residues, x is an integer from0-9, and N is any nucleotide residue.

[0113] The sequence chosen arbitrarily and positioned at the 3′ end ofthe primer can be manipulated in multiple-primed-DOP-PCR to produce adifferent end product than for DOP-PCR because use of two or more setsof primers adds another level of diversity, thus producing a RCG oramplified genome, depending on the primers chosen. Each of the at leasttwo sets of primers of multiple-primed-DOP-PCR has a different TARGETsequence. Similar to the single primer of DOP-PCR a set of primers isgenerated for each of the at least two primers and, every primer withina single set has the same TARGET sequence as the other primers of theset. This TARGET sequence is flanked at its 5′ end by 0 to 9 nucleotideresidues (“N”s). The set of N's will differ from primer to primer withina set of primers. A set of primers may include up to 4^(x) differentprimers, each primer having a unique (N)_(x) sequence. Finally a tag canbe positioned at the 5′ end.

[0114] In other aspects of the invention, methods for identifying SNPscan be performed using RNA genomes rather than RCGs. RNA genomes differfrom RCGs in that they are generated from RNA rather than from DNA. AnRNA genome can be, for instance, a cDNA preparation made by reversetranscription of RNA obtained from cells of a subject (e.g. humanovarian carcinoma cells). Thus, an RNA genome can be composed of DNAsequences, as long as the DNA is derived from RNA. RNA can also be useddirectly.

[0115] The genotyping and other methods of the invention can also beperformed using a RNA genotyping method. This method involves use ofRNA, rather than DNA, as the source of nucleic acid for genotyping. Inthis embodiment, RNA is reverse transcribed (e.g. using a reversetranscriptase) to produce cDNA for use as an RNA genome. The RNA methodhas at least one advantage over DNA-based methods. SNPs in codingregions (cSNPs) are more likely to be directly involved in detectablephenotypes and are thus more likely to be informative with regard to howsuch phenotypes can be affected. Furthermore, since this method canrequire only a reverse transcription step, it is amenable tohigh-throughput analysis. In a preferred embodiment, a reversetranscriptase primer which only binds a subset of RNA species (e.g. a dTprimer having a 3-base anchor, e.g. TTTTTTTTTT CAG; SEQ ID NO: 2) isused to further reduce RNA genome complexity (48-fold using the dt-3baseanchor primer). In the RNA-genotyping method of the invention theRNA/cDNA sample can be attached to a surface and hybridized with aSNP-ASO.

[0116] In another aspect, the invention includes a method foridentifying a SNP. Genomic fragments which include SNPs can be preparedaccording to the invention by preparing a set of primers from a RCG(e.g., a RCG is composed of a set of PCR products), performing PCR usingthe set of primers to amplify a plurality of isolated genomes to produceDNA products, and identifying SNPs included in the DNA products. Thepresence of a SNP in the DNA product can be identified using methodssuch as direct sequencing, i.e. using dideoxy chain termination or MaxamGilbert (see e.g., Sambrook et al, “Molecular Cloning: A LaboratoryManual,” Cold Spring Harbor Laboratory, 1989, New York; or Zyskind etal., Recombinant DNA Laboratory Manual, Acad. Press, 1988), denaturinggradient gel electrophoresis to identify different sequence dependentmelting properties and electrophoretic migration of SNPs containing DNAfragments (see e.g., Erlich, ed., PCR Technology, Principles andapplications for DNA Amplification, Freeman and Co., NY, 1992), andconformation analysis to differentiate sequences based on differences inelectrophoretic migration patterns of single stranded DNA products (seee.g., Orita et al., Proc. Nat. Acad. Sci. 86, 2766-2770, 1989). Inpreferred embodiments, the SNPs are identified based on the sequences ofthe polymerase chain-reaction products identified using sequencingmethods.

[0117] A “single nucleotide polymorphism” or “SNP” as used herein is asingle base pair (i.e., a pair of complementary nucleotide residues onopposite genomic strands) within a DNA region wherein the identities ofthe paired nucleotide residues vary from individual to individual. Atthe variable base pair in the SNP, two or more alternative base pairingsoccur at a relatively high frequency (greater than 1%) in a subject,(e.g. human) population.

[0118] A “polymorphic region” is a region or segment of DNA thenucleotide sequence of which varies from individual to individual. Thetwo DNA strands which are complementary to one another except at thevariable position are referred to as alleles. A polymorphism is allelicbecause some members of a species have one allele and other members havea variant allele and some have both. When only one variant sequenceexists, a polymorphism is referred to as a diallelic polymorphism. Thereare three possible genotypes in a diallelic polymorphic DNA in a diploidorganism. These three genotypes arise because it is possible that adiploid individual's DNA may be homozygous for one allele, homozygousfor the other allele, or heterozygous (i.e. having one copy of eachallele). When other mutations are present, it is possible to havetriallelic or higher order polymorphisms. These multiple mutationpolymorphisms produce more complicated genotypes.

[0119] SNPs are well-suited for studying sequence variation because theyare relatively stable (i.e. they exhibit low mutation rates) and becauseit appears that SNPs can be responsible for inherited traits. Theseproperties make SNPs particularly useful as genetic markers foridentifying disease-associated genes. SNPs are also useful for suchpurposes as linkage studies in families, determining linkagedisequilibrium in isolated populations, performing association analysisof patients and controls, and loss of heterozygosity studies in tumors.

[0120] An exemplary method for identifying SNPs is presented in theExamples below. Briefly, DOP-PCR is performed using genomic DNA obtainedfrom an individual. The products are separated on an agarose gel. Theproducts are separated by approximate length into approximately 8segments having sizes of about 400-1000 base pairs, and libraries aremade from each of the segments. This approach prevents domination of thelibrary by one or two abundant products. Plasmid DNA is isolated fromindividual colonies containing portions of the library. Inserts areisolated and the ends of the inserts are sequenced using vector primers.A new set of primers is then synthesized based on these insert sequencesto allow PCR to be performed using RCG obtained from one or moreindividuals or from a pool of individuals. The DNA products generated bythe PCR are sequenced and inspected for the presence of two nucleotideresidues at one location, an indication that a polymorphism exists atthat position within one of the alleles.

[0121] A “primer” as used herein is a polynucleotide which hybridizeswith a target nucleic acid with which it is complementary and which iscapable of acting as an initiator of nucleic acid synthesis underconditions for primer extension. Primer extension conditions includehybridization between the primer and template, the presence of freenucleotides, a chain extender enzyme, e.g., DNA polymerase, andappropriate temperature and pH.

[0122] In preferred embodiments, a set of primers is prepared by atleast the following steps: preparing a RCG, composed of a set of PCRproducts, separating the set of PCR products into individual PCRproducts, determining the sequence of each end of at least one of thePCR products, and generating the set of primers for use in thesubsequent PCR step based on the sequence of the ends of the insert(s).

[0123] A “set of PCR products”, as used herein, is a plurality ofsynthetic polynucleotide sequences, each polynucleotide sequence beingdifferent from one another except for a stretch of nucleotides in the 5′and 3′ regions of the polynucleotides which are identical in eachpolynucleotide. These regions correspond to the primers used to generatethe RCG and the sequence in these regions varies depending on whatprimer is used. When a DOP PCR primer is used, the sequence that variesin each primer preferably has a sequence N_(x), wherein x is 512 and Nis any nucleotide. A set of DNA products is different from a “set of PCRproducts” as used herein and refers to DNA generated by PCR usingspecific primers which amplify a specific locus.

[0124] Once the sequence of a primer is known, the primer may bepurified from a nucleic acid preparation which includes, it or it may beprepared synthetically. For instance, nucleic acid fragments may beisolated from nucleic acid sequences in genomes, plasmids, or othervectors by site-specific cleavage, etc. Alternatively, the primers maybe prepared by de novo chemical synthesis, such as by usingphosphotriester or phosphodiester synethetic methods, such as thosedescribed in U.S. Pat. No. 4,356,270; Itakura et al. (1989), Ann. Rev.Biochem., 53:323-56; and Brown et al. (1979), Meth. Enzymol., 68:109.Primers may also be prepared using recombinant technology, such as thatdescribed in Sambrook, “Molecular Cloning: A Laboratory Manual,” ColdSpring Harbor Laboratory, p.390-401 (1982).

[0125] The term “nucleotide residue” refers to a single monomeric unitof a nucleic acid such as DNA or RNA. The term “base pair” refers to twonucleotide residues which are complementary to one another and arecapable of hydrogen bonding with one another. Traditional base pairs arebetween G:C and T:A. The letters G, C, T, U and A refer to(deoxy)guanosine, (deoxy)cytidine, (deoxy)thymidine, uridine, and(deoxy)adenosine, respectively. The term “nucleic acids” as used hereinrefers to a class of molecules including single stranded and doublestranded deoxyribonucleic acid (DNA), ribonucleic acid (RNA), andpolynucleotides. Nucleic acids within the scope of the invention includenaturally occurring and synthetic nucleic acids, nucleic acid analogs,modified nucleic acids, nucleic acids containing modified nucleotides,modified nucleic acid analogs, and mixtures of any of these.

[0126] SNPs identified or detected in the genotyping methods describedherein can also be identified by other methods known in the art. Manymethods have been described for identifying SNPs. (see e.g. WO95/12607,Bostein, et al., Am. J. Hum. Genet, 32:314-331 (1980), etc.). In someembodiments, it is preferred that SNPs be identified using the samemethod that will subsequently be used for genotype analysis.

[0127] As discussed briefly above, the SNPs and RCGs of the inventionare useful for a variety of purposes. For instance, SNPs and RCGs areuseful for performing genotyping analysis; for identification of asubject, such as in paternity or maternity testing, in immigration andinheritance disputes, in breeding tests in animals, in zygosity testingin twins, in tests for inbreeding in humans and animals; in evaluationof transplant suitability such as with bone marrow transplants; inidentification of human and animal remains; in quality control ofcultured cells; in forensic testing such as forensic analysis of semensamples, blood stains, and other biological materials; incharacterization of the genetic makeup of a tumor by testing for loss ofheterozygosity; in determining the allelic frequency of a particularSNP; and in generating a genomic classification code for a genome byidentifying the presence or absence of each of a panel of SNPs in thegenome of a subject and optionally determining the allelic frequency ofthe SNPs.

[0128] A preferred use of the invention is in a high throughput methodof genotyping. “Genotyping” is the process of identifying the presenceor absence of specific genomic sequences within genomic DNA. Distinctgenomes may be isolated from individuals of populations which arerelated by some phenotypic characteristic, by familial origin, byphysical proximity, by race, by class, etc. in order to identifypolymorphisms (e.g. ones associated with a plurality of distinctgenomes) which are correlated with the phenotype family, location, race,class, etc. Alternatively, distinct genomes may be isolated at randomfrom populations such that they have no relation to one another otherthan their origin in the population. Identification of polymorphisms insuch genomes indicates the presence or absence of the polymorphisms inthe population as a whole, but not necessarily correlated with aparticular phenotype.

[0129] Although genotyping is often used to identify a polymorphismassociated with a particular phenotypic trait, this correlation is notnecessary. Genotyping only requires that a polymorphism, which may ormay not reside in a coding region, is present. When genotyping is usedto identify a phenotypic characteristic, it is presumed that thepolymorphism affects the phenotypic trait being characterized. Aphenotype may be desirable, detrimental, or, in some cases, neutral.

[0130] Polymorphisms identified according to the methods of theinvention can contribute to a phenotype. Some polymorphisms occur withina protein coding sequence and thus can affect the protein structure,thereby causing or contributing to an observed phenotype. Otherpolymorphisms occur outside of the protein coding sequence but affectthe expression of the gene. Still other polymorphisms merely occur neargenes of interest and are useful as markers of that gene. A singlepolymorphism can cause or contribute to more than one phenotypiccharacteristic and, likewise, a single phenotypic characteristic may bedue to more than one polymorphism. In general multiple polymorphismsoccurring within a gene correlate with the same phenotype. Additionally,whether an individual is heterozygous or homozygous for a particularpolymorphism can affect the presence or absence of a particularphenotypic trait.

[0131] Phenotypic correlation is performed by identifying anexperimental population of subjects exhibiting a phenotypiccharacteristic and a control population which do not exhibit thatphenotypic characteristic. Polymorphisms which occur within theexperimental population of subjects sharing a phenotypic characteristicand which do not occur in the control population are said to bepolymorphisms which are correlated with a phenotypic trait. Once apolymorphism has been identified as being correlated with a phenotypictrait, genomes of subjects which have potential to develop a phenotypictrait or characteristic can be screened to determine occurrence ornon-occurrence of the polymorphism in the subjects' genomes in order toestablish whether those subjects are likely to eventually develop thephenotypic characteristic. These types of analyses are generally carriedout on subjects at risk of developing a particular disorder such asHuntington's disease or breast cancer.

[0132] A phenotypic trait encompasses any type of genetic disease,condition, or characteristic, the presence or absence of which can bepositively determined in a subject. Phenotypic traits that are geneticdiseases or conditions include multifactorial diseases of which acomponent may be genetic (e.g. owing to occurrence in the subject of aSNP), and predisposition to such diseases. These diseases include suchas, but not limited to, asthma, cancer, autoimmune diseases,inflammation, blindness, ulcers, heart or cardiovascular diseases,nervous system disorders, and susceptibility to infection by pathogenicmicroorganisms or viruses. Autoimmune diseases include, but are notlimited to, rheumatoid arthritis, multiple sclerosis, diabetes, systemiclupus, erythematosus and Grave's disease. Cancers include, but are notlimited to, cancers of the bladder, brain, breast, colon, esophagus,kidney, hematopoietic system eg. leukemia, liver, lung, oral cavity,ovary, pancreas, prostate, skin, stomach, and uterus. A phenotypiccharacteristic includes any attribute of a subject other than a diseaseor disorder, the presence or absence of which can be detected. Suchcharacteristics can, in some instances, be associated with occurrence ofa SNP in a subject which exhibits the characteristic. Examples ofcharacteristics include, but are not limited to, susceptibility to drugor other therapeutic treatments, appearance, height, color (e.g. offlowering plants), strength, speed (e.g. of race horses), hair color,etc. Many examples of phenotypic traits associated with geneticvariation have been described, see e.g., U.S. Pat. No. 5,908,978 (whichidentifies association of disease resistance in certain species ofplants associated with genetic variations) and U.S. Pat. No. 5,942,392(which describes genetic markers associated with development ofAlzheimer's disease).

[0133] Identification of associations between genetic variations (e.g.occurrence of SNPs) and phenotypic traits is useful for many purposes.For example, identification of a correlation between the presence of aSNP allele in a subject and the ultimate development by the subject of adisease is particularly useful for administering early treatments, orinstituting lifestyle changes (e.g., reducing cholesterol or fatty foodsin order to avoid cardiovascular disease in subjects having agreater-than-normal predisposition to such disease), or closelymonitoring a patient for development of cancer or other disease. It mayalso be useful in prenatal screening to identify whether a fetus isafflicted with or is predisposed to develop a serious disease.Additionally, this type of information is useful for screening animalsor plants bred for the purpose of enhancing or exhibiting of desiredcharacteristics.

[0134] One method for determining a genotype associated with a pluralityof genomes is screening for the presence or absence of a SNP in aplurality of RCGs. For example, such screening may be performed using ahybridization reaction including a SNP-ASO and the RCGs. Either theSNP-ASO or the RCGs can, optionally be immobilized on a surface. Thegenotype is determined based on whether the SNP-ASO hybridizes with atleast some of the RCGs. Other methods for determining a genotype involvemethods which are not based on hybridization, including, but not limitedto, mass spectrometric methods. Methods for performing mass spectrometryusing nucleic acid samples have been described. See e.g., U.S. Pat. No.5,885,775. The components of the RCG can be analyzed by massspectrometry to identify the presence or absence of a SNP allele in theRCG.

[0135] A “SNP-ASO”, as used herein, is an oligonucleotide which includesone of two alternative nucleotides at a polymorphic site within itsnucleotide sequence. In some embodiments, it is preferred that theoligonucleotide include only a single mismatched nucleotide residuenamely the polymorphic residue, relative to an allele of a SNP. In othercases, however, the oligonucleotide may contain additional nucleotidemismatches such as neutral bases or may include nucleotide analogs. Thisis described in more detail below. In preferred embodiments, the SNP-ASOis composed from about 10 to 50 nucleotide residues. In more preferredembodiments, it is composed of from about 10 to 25 nucleotide residues.

[0136] Oligonucleotides may be purchased from commercial sources such asGenosys, Inc., Houston, Tex. or, alternatively, may be synthesized denovo on an Applied Biosystems 381 A DNA synthesizer or equivalent typeof machine.

[0137] The oligonucleotides may be labeled by any method known in theart. One preferred method is end-labeling, which can be performed asdescribed in Maniatis et al., “Molecular Cloning: A Laboratory Manual”,Cold Spring Harbor Laboratories, Cold Spring Harbor, N.Y. (1982).

[0138] It is possible that in organisms having a relatively non-complexgenome, only a minimal complexity reduction step is necessary, and thegenomic DNA may be directly analyzed or minimally reduced. This isparticularly useful for screening tissue isolates to detect the presenceof a bacterium or to identify the bacteria. Additionally, it is possiblethat, upon development of certain technical advances (e.g., morestringent hybridization, more sensitive detection equipment), evencomplex genomes may not need an extensive complexity reduction step.

[0139] Preferably, automated genotyping is performed. In general,genomic DNA of a well-characterized set of subjects, such as the CEPHfamilies, is processed using PCR with appropriate primers to produceRCGs. The DNA is spotted onto one or more surfaces (e.g., multiple glassslides) for genotyping. This process can be performed using a microarrayspotting apparatus which can spot more than 1,000 samples within asquare centimeter area, or more than 10,000 samples on a typicalmicroscope slide. Each slide is hybridized with a fluorescently taggedallele-specific SNP oligonucleotide under TMAC conditions analogous tothose described below. The genotype of each individual can be determinedby detecting the presence or absence of a signal for a selected set ofSNP-ASOs. A schematic of the method is shown in FIG. 4.

[0140] Once the complexity of genomic DNA obtained from an individualhas been reduced, the resulting genomic DNA fragments can be attached toa solid support in order to be analyzed by hybridization. The RCGfragments may be attached to the slide by any method for attaching DNAto a surface. Methods for immobilizing nucleic acids have been describedextensively, e.g., in U.S. Pat. Nos. 5,679,524; 5,610,287; 5,919,626;and 5,445,934. For instance, DNA fragments may be spotted ontopoly-L-lysine-coated glass slides, and then crosslinked by UVirradiation. A second, more preferred method, which has been developed,involves including a 5′ amino group on each of the DNA fragments of theRCG. The DNA fragments are spotted onto silane-coated slides in thepresence of NaOH in order to covalently attach the fragments to theslide. This method is advantageous because a covalent bond is formedbetween the fragments and the surface. Another method for accomplishingDNA fragment immobilization is to spot the RCG fragments onto a nylonmembrane. Other methods of binding DNA to surfaces are possible and arewell known to those of ordinary skill in the art. For instance,attachment to amino-alkyl-coated slides can be used. More detailedmethods are described in the Examples below.

[0141] The surface to which the oligonucleotide arrays are conjugated ispreferably a rigid or semi-rigid support which may, optionally, haveappropriate light absorbing or transmitting characteristics for use withcommercially available detection equipment. Substrates which arecommonly used and which have appropriate light absorbing or transmittingcharacteristics include, but are not limited to, glass, Si, Ge, GaAs,GaP, SiO₂, SiN₄, modified silicon, and polymers such as(poly)tetrafluoroethylene, (poly)vinylidenedifluoride, polystyrene,polycarbonate, or combinations thereof. Additionally, the surface of thesupport may be non-coated or coated with a variety of materials.Coatings include, but are not limited to, polymers, plastics, resins,polysaccharides, silica or silica-based materials, carbon, metals,inorganic glasses, and membranes.

[0142] In one embodiment the SNP-ASOs are hybridized under standardhybridization conditions with RCGs covalently conjugated to a surface.Briefly, SNP-ASOs are labeled at their 5′ ends. A hybridization mixturecontaining the SNP-ASOs and, optionally, an isostabilizing agent,denaturing agent, or renaturation accelerant is brought into contactwith an array of RCGs immobilized on the surface and the mixture and thesurface are incubated under appropriate hybridization conditions. TheSNP-ASOs which do not hybridize are removed by washing the array with awash mixture (such as a hybridization buffer) to leave only hybridizedSNP-ASOs attached to the surface. After washing, detection of the label(e.g., a fluorescent molecule) is performed. For example, an image ofthe surface can be captured (e.g., using a fluorescence microscopeequipped with a CCD camera and automated stage capabilities,phosphoimager, etc.). The label may also, or instead, be detailed usinga microarray scanner (e.g. one made by Genetic Microsystems). Amicroarray scanner provides image analysis which can be converted to abinary (i.e. +/−) signal for each sample using, for example, any ofseveral available software applications (e.g., NIH image, ScanAnalyze,etc.) in a data format. The high signal/noise ratio for this analysisallows determination of data in this mode to be straightforward andeasily automated. These data, once exported, can be manipulated togenerate a format which can be directly analyzed by human geneticsapplications (such as CR1-MAP and LINKAGE via software). Additionally,the methods may utilize two or more fluorescent dyes which can bespectrally differentiated to reduce the number of samples to beanalyzed. For instance, if four fluorescent dyes having spectraldistinctions (e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used. Thenfour hybridization reactions can be carried out under a singlehybridization condition. In other embodiments discussed in more detailbelow, the SNP-ASOs are conjugated to a surface and hybridized withRCGs.

[0143] Conditions for optimal hybridization are described below in theExamples. In general, the SNP-ASO is present in a hybridization mixtureat a concentration of from about 0.005 nanomoles per liter SNP-ASOhybridization mixture to about 50 nM SNP-ASO per ml hybridizationmixture. More preferably, the concentration is from 0.5 nanomoles perliter to 1 nanomole per liter. A preferred concentration forradioactivity is 0.66 nanomoles per liter. The mixture preferably alsoincludes a hybridization optimizing agent in order to improve signaldiscrimination between genomic sequences which are identicallycomplementary to the SNP-ASO and those which contain a single mismatchednucleotide (as well as any neutral base etc. substitutions).Isostabilizing agents are compounds such as betaines and lowertetraalkyl ammonium salts which reduce the sequence dependence of DNAthermal melting transitions. These types of compounds also increasediscrimination between matched and mismatched SNPs/genomes. A denaturingagent may also be included in the hybridization mixture. A denaturingagent is a composition that lowers the melting temperature of doublestranded nucleic acid molecules, generally by reducing hydrogen bondingbetween bases or preventing hydration of nucleic acid molecules.Denaturing agents are well-known in the art and include, for example,DMSO, formaldehyde, glycerol, urea, formamide, and chaotropic salts. Thehybridization conditions in general are those used commonly in the art,such as those described in Sambrook et al., “Molecular Cloning: ALaboratory Manual”, (1989), 2nd Ed., Cold Spring Harbor, N.Y.; Bergerand Kimmel, “Guide to Molecular Cloning Techniques”, Methods inEnzymology, (1987), Volume 152, Academic Press, Inc., San Diego, Calif.;and Young and Davis, (1983), PNAS (USA) 80:1194.

[0144] In general, incubation temperatures for hybridization of nucleicacids range from about 20° C. to 75° C. For probes 17 nucleotidesresidues and longer, a preferred temperature range for hybridization isfrom about 50° C. to 54° C. The hybridization temperature for longerprobes is preferably from about 55° C. to 65° C. and for shorter probesis less than 52° C. Rehybridization may be performed in a variety oftime frames. Preferably, hybridization of SNP and RCGs performed for atleast 30 minutes.

[0145] Preferably, either or both of the SNP-ASO and the RCG arelabeled. The label may be added directly to the SNP-ASO or the RCGduring synthesis of the oligonucleotide or during generation of RCGfragments. For instance, a PCR reaction performed using labeled primersor labeled nucleotides will produce a labeled product. Labelednucleotides (e.g., fluorescein-labeled CTP) are commercially available.Methods for attaching labels to nucleic acids are well known to those ofordinary skill in the art and, in addition to the PCR method, include,for example, nick translation and end-labeling.

[0146] Labels suitable for use in the methods of the present inventioninclude any type of label detectable by standard means, includingspectroscopic, photochemical, biochemical, electrical, optical, orchemical methods. Preferred types of labels include fluorescent labelssuch as fluorescein. A fluorescent label is a compound comprising atleast one fluorophore. Commercially available fluorescent labelsinclude, for example, fluorescein phosphoramidides such as fluoreprime(Pharmacia, Piscataway, N.J.), fluoredite (Millipore, Bedford, Mass.),FAM (ABI, Foster City, Calif.), rhodamine, polymethadine dye derivative,phosphores, Texas red, green fluorescent protein, CY3, and CY5.Polynucleotides can be labeled with one or more spectrally distinctfluorescent labels. “Spectrally distinct”fluorescent labels are labelswhich can be distinguished from one another based on one or more oftheir characteristic absorption spectra, emission spectra, fluorescentlifetimes, or the like. Spectrally distinct fluorescent labels have theadvantage that they may be used in combination (“multiplexed”).Radionuclides such as ³H, ¹²⁵I, ³⁵S, ¹⁴C, or ³²P are also useful labelsaccording to the methods of the invention. A plurality of radioactivelydistinguishable radionuclides can be used. Such radionuclides can bedistinguished, for example, based on the type of radiation (e.g. α, β,or δ radiation) emitted by the radionuclides. The ³²P signal can bedetected using a phosphoimager, which currently has a resolution ofapproximately 50 microns. Other known techniques, such aschemiluminescence or colormetric (enzymatic color reaction), can also beused.

[0147] By using spectrally distinct fluorescent probes, it is possibleto analyze more than one locus a single hybridization mixture. The term“multiplexing” refers to the use of a set of distinct fluorescent labelsin a single assay. Such fluorescent labels have been describedextensively in the art, such as the fluorescent labels described in PCTPublished Patent Application WO98/31834.

[0148] Fluorescent primers are a preferred method of labelingpolynucleotides. The fluorescent tag is stable for more than a year.Radioactively labeled primers are stable for a shorter period. Inaddition, fluorescent primers may be used in combination if they arespectrally distinct, as discussed above. This allows multiplehybridizations to be detected in a single hybridization mixture. As aresult, the total number of reactions needed for a genome-wide scan isreduced. For example, for analysis of 1000 loci, 2000 hybridizations areneeded (1000 loci×2 polymorphisms/loci). The use of 4fluorescently-labeled oligonucleotides will cut this number 4-fold andthus only 500 hybridizations will be needed.

[0149] In order to determine the genotype of an individual at a SNPlocus, it is desirable to employ SNP allele-specific oligonucleotidehybridization. Preferably, two hybridization mixtures are prepared foreach locus (or they can be performed together). The first hybridizationmixture contains a labeled (e.g., radioactive or fluorescent) SNP-ASO(typically 17-21 nucleotide residues in length centered around thepolymorphic residue). To increase specificity, a 20-50 fold excess ofnon-labeled oligonucleotides corresponding to another allele (referredto herein as a “complementary SNP-ASO”) is included in the hybridizationmixture. Use of the non-labeled complementary SNP-ASO can be avoided byusing SNP-ASO containing a neutral base as described below. In thesecond hybridization mixture, the SNP-ASO that was labeled in the firstmixture is not labeled, and the non-labeled SNP-ASO is labeled instead.Hybridization is performed in the presence of a hybridization buffer.The melting temperature of oligonucleotides can be determinedempirically for each experiment. The pair of 2 oligonucleotidescorresponding to different alleles of the same SNP (the SNP-ASOs and thecomplementary SNP-ASO) are referred to herein as a pair ofallele-specific oligonucleotides (ASOs). Further experimental detailsregarding selecting and making SNP-ASOs are provided in the Examplessection below.

[0150] In addition to the method described above, several other methodsof allele specific hybridization may be used for hybridizing SNP-ASOswith RCGs. One method is to increase discrimination of SNPs in DNAhybridization by means of artificial mismatches. Artificial mismatchesare inserted into oligonucleotide probes using a neutral base such asthe base analog 3-nitropyrrole. A significant enhancement ofdiscrimination is generally obtained, with a strong dependence of theenhancement on the spacing between mismatches.

[0151] In general, the methods described above are based on conjugationof genomic DNA fragments (i.e. a RCG) to a solid support. Hybridizationanalysis can also be performed with the SNP-ASO conjugated to thesupport (e.g. in an array). The oligonucleotide array is hybridized withone or more RCGs. Attaching of the SNP-ASOs or RCGs onto the support maybe performed by any method known in the art. Many methods for attachingoligonucleotides to surfaces in arrays have been described, see, e.g.PCT Published Patent Application WO97/29212, U.S. Pat. Nos. 4,588,682;5,667,976; and 5,760,130. Other methods include, for example, usingarrays of metal pins. Additionally, RCGs may be attached to the surfaceby the methods disclosed in the Examples below.

[0152] An “array” as used herein is a set of molecules arranged in aspecific order with respect to a surface. Preferably the array iscomposed of polynucleotides (e.g. either SNP-ASOs or RCGs) attached tothe surface. Oligonucleotide arrays can be used to screen nucleic acidsamples for a target nucleic acid, which can be labeled with adetectable marker. A fluorescent signal resulting from hybridizationbetween a target nucleic acid and a substrate-bound oligonucleotideprovides information relating to the identity of the target nucleic acidby reference to the location of the oligonucleotide in the array on thesubstrate. Such a hybridization assay can generate thousands of signalswhich exhibit different signal strengths. These signals correspond toparticular oligonucleotides of the array. Different signal strengthswill arise based on the amount of labeled target nucleic acid hybridizedwith an oligonucleotide of the array. This amount, in turn, can beinfluenced by the proportion of AT-rich regions and GC-rich regionswithin the oligonucleotide (which determines thermal stability). Therelative amounts of hybridized target nucleic acid can also beinfluenced by, for example, the number of different probes arrayed onthe substrate, the length of the target nucleic acid, and the degree ofhybridization between mismatched residues. Oligonucleotide arrays, insome embodiments, have a density of at least 500 features per squarecentimeter, but in practice can have much lower densities. A feature, asused herein, is an area of a substrate on which oligonucleotides havinga single sequence are immobilized.

[0153] The oligonucleotide arrays of the invention may be produced byany method known in the art. Many such arrays are commerciallyavailable, and many methods have been described for producing them. Onepreferred method for producing arrays includes spatially directedoligonucleotide synthesis. Spatially directed oligonucleotide may beperformed using light-directed oligonucleotide synthesis,microlithography, application by ink jet, microchannel deposition tospecific location, and sequestration with physical barriers. Each ofthese methods is well-known in the art and has been describedextensively. For instance, the light-directed oligonucleotide synthesismethod has been disclosed in U.S. Pat. Nos. 5,143,854; 5,489,678; and5,571,639; and PCT applications having publication numbers WO90/15070;WO92/10092; and WO94/12305. This technique involves modification of thesurface of the solid support with linkers and photolabile protectinggroups using a photolithographic mask to produce reactive (e.g.hydroxyl) groups in the illuminated regions. A3′-O-phosphoramideactivated deoxynucleocide having a5′-hydroxylprotected group is supplied to the surface such that couplingoccurs at sites that were exposed to light. The substrate is rinsed, andthe surface is illuminated with a second mask, and another activateddeoxynucleotide is presented to the surface. The cycle is repeated untilthe desired set of products is obtained. After the cycle is finished,the nucleotides can be capped. Another method involves mechanicallyprotecting portions of the surface and selectively deprotecting/couplingmaterials to the exposed portions of the surface, such as the methoddescribed in U.S. Pat. No. 5,384,261. The mechanical means is generallyreferred to as a mask. Other methods for array preparation are describedin PCT Published Patent Applications WO97/39151, WO98/20967, andWO98/10858, which describe an automated apparatus for the chemicalsynthesis of molecular arrays, U.S. Pat. No. 5,143,854, Fodor et al.,Science (1991), 251:767-777 and Kozal et al., Nature Medicine, v. 2, p.753-759 (1996).

[0154] Hybridizing a SNP-ASO with an array of RCGs (or hybridizing a RCGwith an array of SNP ASO) is followed by detection of hybridization.Part of the genotyping methods described herein is to determine if apositive or negative signal exists for each hybridization for anindividual and then based on this information, determine the genotypefor the corresponding SNP locus. This step is relativelystraightforward, but varies depending on the method of detection.Essentially, all of the detection methods described here (fluorescent,radioactive, etc.) can be reduced to a digital image file, e.g. using amicroarray reader or phosphoimager. Presently, there are severalsoftware products which will overlay a grid on an image and determinethe signal strength value for each element of the grid. These values canbe imported into a computer program, such as the Microsoft Corporationspreadsheet program designated Microsoft Excel™, with which simpleanalysis can be performed to assign each signal a manipulable value(e.g. 1 or 0 or + or −). Once this is accomplished, an individual'sgenotype can be described in terms of the pattern of hybridization ofRCG fragments obtained from the individual with selected SNP ASOcorresponding to disease-associated SNPs.

[0155] The array having labeled SNP-ASOs (or labeled RCGs) hybridizedthereto can be analyzed using automated equipment. Automated equipmentfor analyzing arrays can include an excitation radiation source whichemits radiation at a first wavelength, an optical detector, and a stagefor securing the surface supporting the array. The excitation sourceemits excitation radiation which is focused on at least one area of thearray and which induces emission from fluorescent labels. The signal ispreferably in the form of radiation having a different wavelength thanthe excitation radiation. Emitted radiation is collected by a detector,which generates a signal proportional to the amount of radiation sensedthereon. The array may then be moved so that a different area can beexposed to the radiation source to produce a signal. Once each area ofthe array has been scanned, a two-dimensional image of the array isobtained. Preferably, the movement of the array is accomplished usingautomated equipment, such as a multi-axis translation stage, such as onewhich moves the array at a constant velocity. In alternativeembodiments, the array may remain stationary, and devices may beemployed to cause scanning of the light over the stationary array.

[0156] One type of detection method includes a CCD imaging system, e.g.when the nucleic acids are labeled with fluorescent probes. Otherdetectors are well known to those of skill in the art and also, oralternatively, be used. CCD imaging systems for use with array detectionhave been described. For instance, a photodiode detector may be placedon the opposite side of the array from the excitation source.Alternatively, a CCD camera may be used in place of the photodiodedetector to image the array. One advantage of using these systems israpid read time. In general, an entire 50×50 centimeter array can beread in about 30 seconds or less using standard equipment. If morepowerful equipment and efficient dyes are used, the read time may bereduced to less than 5 seconds.

[0157] Once the data is obtained, e.g. as a two-dimensional image, acomputer can be used to transform the data into a displayed image whichvaries in color depending on the intensity of light emission at aparticular location. Any type of commercial software which can performthis type of data analysis can be used. In general, the data analysisinvolves the steps of determining the intensity of the fluorescenceemitted as a function of the position on the substrate, removing theoutliers, and calculating the relative binding affinity. One or more ofthe presence, absence, and intensity of signal corresponding to a labelis used to assess the presence or absence of an SNP corresponding to thelabel in the RCG. The presence and absence of one or more SNP's in a RCGcan be used to assign a genotype to the individual. For example, thefollowing depicts the genotype analysis of 3 individuals at a givenlocus at which an A/G polymorphism occurs: Individual SNP 1 Allele “A”SNP 1 Allele “G” Genotype Larry + − A/A Moe − + G/G Curly + + A/G

[0158] As mentioned above, SNP analysis can be used to determine whetheran individual has or will develop a particular phenotypic trait andwhether the presence or absence of a specific allele correlates with aparticular phenotypic trait. In order to determine which SNPs arerelated to a particular phenotypic trait, genomic samples are isolatedfrom a group of individuals which exhibit the particular phenotypictrait, and the samples are analyzed for the presence of common SNPs. Thegenomic sample obtained from each individual is used to prepare a RCG.These RCGs are screened using panels of SNPs in a high throughput methodof the invention to determine whether the presence or absence of aparticular allele is associated with the phenotype. In some cases, itmay be possible to predict the likelihood that a particular subject willexhibit the related phenotype. If a particular polymorphic allele ispresent in 30% of individuals who develop Alzheimer's disease, then anindividual having that allele has a higher likelihood of developingAlzheimer's disease. The likelihood can also depend on several factorssuch as whether individuals not afflicted with Alzheimer's disease havethis allele and whether other factors are associated with thedevelopment of Alzheimer's disease. This type of analysis can be usefulfor determining a probability that a particular phenotype will beexhibited. In order to increase the predictive ability of this type ofanalysis, multiple SNPs associated with a particular phenotype can beanalyzed. Although values can be calculated, it is enough to identifythat a difference exists.

[0159] It is also possible to identify SNPs which segregate with aparticular disease. Multiple polymorphic sites may be detected andexamined to identify a physical linkage between them or between a marker(SNP) and a phenotype. Both of these are useful for mapping a geneticlocus linked to or associated with a phenotypic trait to a chromosomalposition and thereby revealing one or more genes associated with thephenotypic trait. If two polymorphic sites segregate randomly, then theyare either on separate chromosomes or are distant enough, with respectto one another on the same chromosome that they do not co-segregate. Iftwo sites co-segregate with significant frequency, then they are linkedto one another on the same chromosome. These types of linkage analysesare useful for developing genetic maps. See e.g., Lander et al., PNAS(USA) 83, 7353-7357 (1986), Lander et al., Genetics 121, 185-199 (1989).The invention is also useful for identifying polymorphic sites which donot segregate, i.e., when one sibling has a chromosomal region thatincludes a polymorphic site and another sibling does not have thatregion.

[0160] Linkage analysis is often performed on family members whichexhibit high rates of a particular phenotype or on patients sufferingfrom a particular disease. Biological samples are isolated from eachsubject exhibiting a phenotypic trait, as well as from subjects which donot exhibit the phenotypic trait. These samples are each used togenerate individual RCGs and the presence or absence of polymorphicmarkers is determined using panels of SNPs. The data can be analyzed todetermine whether the various SNPs are associated with the phenotypictrait and whether or not any SNPs segregate with the phenotypic trait.

[0161] Methods for analyzing linkage data have been described in manyreferences, including Thompson & Thompson, Genetics in Medicine (5thedition), W. B. Saunders Co., Philadelphia, 1991; and Strachan, “Mappingthe Human Genome” in the Human Genome (Bios Scientific Publishers Ltd.,Oxford) chapter 4, and summarized in PCT published patent applicationWO98/18967 by Affymetrix, Inc. Linkage analysis involving by calculatinglog of the odds values (LOD values) reveals the likelihood of linkagebetween a marker and a genetic locus at a recombination fraction,compared to the value when the marker and genetic locus are not linked.The recombination fraction indicates the likelihood that markers arelinked. Computer programs and mathematical tables have been developedfor calculating LOD scores of different recombination fraction valuesand determining the recombination fraction based on a particular LODscore, respectively. See e.g., Lathrop, PNAS, USA 81, 3443-3446 (1984);Smith et al., Mathematical Tables for Research Workers in Human Genetics(Churchill, London, 1961); Smith, Ann. Hum. Genet. 32, 127-1500 (1968).Use of LOD values for genetic mapping of phenotypic traits is describedin PCT published patent application WO98/18967 by Affymetrix, Inc. Ingeneral, a positive LOD score value indicates that two genetic loci arelinked and a LOD score of +3 or greater is strong evidence that two lociare linked. A negative value suggests that the linkage is less likely.

[0162] The methods of the invention are also useful for assessing lossof heterozygosity in a tumor. Loss of heterozygosity in a tumor isuseful for determining the status of the tumor, such as whether thetumor is an aggressive, metastatic tumor. The method is generallyperformed by isolating genomic DNA from tumor sample obtained from aplurality of subjects having tumors of the same type, as well as fromnormal (i.e., non-cancerous) tissue obtained from the same subjects.These genomic DNA samples are used to generate RCGs which can behybridized with a SNP-ASO, for example using the surface arraytechnology described herein. The absence of a SNP allele in the RCGgenerated from the tumor compared to the RCG generated from normaltissue indicates whether loss of heterozygosity has occurred. If a SNPallele is associated with a metastatic state of a cancer, the absence ofthe SNP allele can be compared to its presence or absence in anon-metastatic tumor sample or a normal tissue sample. A database ofSNPs which occur in normal and tumor tissues can be generated and anoccurrence of SNPs in a patient's sample can be compared with thedatabase for diagnostic or prognostic purposes.

[0163] It is useful to be able to differentiate non-metastatic primarytumors from metastatic tumors, because metastasis is a major cause oftreatment failure in cancer patients. If metastasis can be detectedearly, it can be treated aggressively in order to slow the progressionof the disease. Metastasis is a complex process involving detachment ofcells from a primary tumor, movement of the cells through thecirculation, and eventual colonization of tumor cells at local ordistant tissue sites. Additionally, it is desirable to be able to detecta pre-disposition for development of a particular cancer such thatmonitoring and early treatment may be initiated. Many cancers and tumorsare associated with genetic alterations. For instance, an extensivecytogenetic analysis of hematologic malignancies such as lymphomas andleukemias have been described, see e.g., Solomon et al., Science 254,1153-1160, 1991. Many solid tumors have complex genetic abnormalitiesrequiring more complex analysis.

[0164] Solid tumors progress from tumorigenesis through a metastaticstage and into a stage at which several genetic aberrations can occur.e.g., Smith et al., Breast Cancer Res. Terat., 18 Suppl. 1, S5-14, 1991.Genetic aberrations are believed to alter the tumor such that it canprogress to the next stage, i.e., by conferring proliferativeadvantages, the ability to develop drug resistance or enhancedangiogenesis, proteolysis, or metastatic capacity. These geneticaberrations are referred to as “loss of heterozygosity.” Loss ofheterozygosity can be caused by a deletion or recombination resulting ina genetic mutation which plays a role in tumor progression. Loss ofheterozygosity for tumor suppressor genes is believed to play a role intumor progression. For instance, it is believed that mutations in theretinoblastoma tumor suppressor gene located in chromosome 13q14 causesprogression of retinoblastomas, osteosarcomas, small cell lung cancer,and breast cancer. Likewise, the short arm of chromosome 3 has beenshown to be associated with cancer such as small cell lung cancer, renalcancer and ovarian cancers. For instance, ulcerative colitis is adisease which is associated with increased risk of cancer presumablyinvolving a multistep progression involving accumulated genetic changes(U.S. Pat. No. 5,814,444). It has been shown that patients afflictedwith long duration ulcerative colitis exhibit an increased risk ofcancer, and that one early marker is loss of heterozygosity of a regionof the distal short arm of chromosome 8. This region is the site of aputative tumor suppressor gene that may also be implicated in prostateand breast cancer. Loss of heterozygosity can easily be detected byperforming the methods of the invention routinely on patients afflictedwith ulcerative colitis. Similar analyses can be performed using samplesobtained from other tumors known or believed to be associated with lossof heterozygosity.

[0165] The methods of the invention are particularly advantageous forstudying loss of heterozygosity because thousands of tumor samples canbe screened at one time. Additionally, the methods can be used toidentify new regions of loss that have not previously been identified intumors.

[0166] The methods of the invention are useful for generating a genomicpattern for an individual genome of a subject. The genomic pattern of agenome indicates the presence or absence of polymorphisms, for example,SNPs, within a genome. Genomic DNA is unique to each individual subject(except identical twins). Accordingly, the more polymorphisms that areanalyzed for a given genome of a subject, the higher probability ofgenerating a unique genomic pattern for the individual from which thesample was isolated. The genomic pattern can be used for a variety ofpurposes, such as for identification with respect to forensic analysisor population identification, or paternity or maternity testing. Thegenomic pattern may also be used for classification purposes as well asto identify patterns of polymorphisms within different populations ofsubjects.

[0167] Genomic patterns may be used for many purposes, includingforensic analysis and paternity or maternity testing. The use of genomicinformation for forensic analysis has been described in many references,see e.g., National Research Council, The Evaluation of Forensic DNAEvidence (EDS Pollard et al., National Academy Press, DC, 1996).Forensic analysis of DNA is based on determination of the presence orabsence of alleles of polymorphic regions within a genomic sample. Themore polymorphisms that are analyzed, the higher probability ofidentifying the correct individual from which the sample was isolated.

[0168] In an embodiment of the invention, when a biological sample, suchas blood or sperm, is found at a crime scene, DNA can be isolated andRCGs can be prepared. This RCG can then be screened with a panel of SNPsto generate a genomic pattern. The genomic pattern can be matched with agenomic pattern produced from a suspect or compared to a database ofgenomic patterns which has been compiled. Preferably, the SNPs used inthe analysis are those in which the frequency of the polymorphicvariation (allelic frequency) has been determined, such that astatistical analysis can be used to determine the probability that thesample genome matches the suspect's genome or a genome within thedatabase. The probability that two individuals have the same polymorphicor allelic form at a given genetic site is described in detail in PCTpublished patent application WO98/18967, the entire contents of whichare hereby incorporated by reference. Briefly, this probability definedas P(ID) can be determined by the equation:

P(ID)=(x ²)²+(2xy)²+(y ²)²

[0169] x and y in the equation represent the frequency that an allele Aor B will occur in a haploid genome.

[0170] The calculation can be extended for more polymorphic forms at agiven locus. The predictability increases with the number of polymorphicforms tested. In a locus of n alleles, a binomial expansion is used tocalculate P(ID). The probabilities of each locus can be multiplied toprovide the cumulative probability of identity and from this thecumulative probability of non-identity for a particular number of locican be calculated. This value indicates the likelihood that randomindividuals have the same loci. The same type of quantitative analysiscan be used to determine whether a subject is a parent of a particularchild. This type of information is useful in paternity testing, animalbreeding studies, and identification of babies or children whoseidentity has been confused, e.g., through adoption or inadequate recordkeeping in a hospital, or through separation of families by occurrencessuch as earthquake or war.

[0171] The genomic pattern may be used to generate a genomicclassification code (GNC).

[0172] The GNC may be represented by one or more data signals and storedas part of a data structure on a computer-readable medium, for example,a database. The stored GNCs may be used to characterize, classify, oridentify the subjects for which the GNCs were generated. Each GNC may begenerated by representing the presence or absence of each polymorphismwith a computer-readable signal. These signals may then be encoded, forexample, by performing a function on the signals.

[0173] Accordingly, the GNCs may be used as part of a classification oridentification system for subjects such as, for example, humans, plants,or animals. As discussed above, the more polymorphisms that are analyzedfor a given genome of a subject, the higher probability of generating aunique genomic pattern for the individual from which the sample wasisolated, and consequently, the higher the probability that the GNCuniquely identifies an individual. In such a system, a data structuremay include a plurality of entries, for example, data records or tableentries, where each entry identifies an individual. Each entry mayinclude the GNC generated for the individual as well as other. The GNCor portions thereof may then be stored in an index data structure, forexample, another table. A portion of a GNC may be indexed so that eachGNC may be further classified by a portion of its genomic pattern asopposed to only the entire genomic pattern.

[0174] The data structures may then be searched to identify anindividual who has committed a crime. For example, if a biologicalsample from the individual (such as blood) is recovered from the crimescene, the GNC of the individual may generated by the methods describedherein, and a database of records including GNCs searched until a matchis found. Thus, the GNCs may be used to classify individuals within agroup such as soldiers in the armed forces, cattle in a herd, or producewithin a specific crop. For example, the armed forces may generate adatabase containing the GNC of each soldier, and the database could beused to identify the soldier if necessary. Likewise, a database could begenerated where records and indexes of the database include the GNCs ofindividual animals within a herd of cattle, so that lost or stolenanimals could later be identified and returned to the proper owner.

[0175] The code may optionally be converted into a bar code or otherhuman- or machine-readable form. For example, each line of a bar codemay indicate the presence of specific polymorphisms or groups ofspecific polymorphisms for a particular subject.

[0176] Additionally, it is useful to be able to identify the genus,species, or other taxonomic classification to which an organism belongs.The methods of the invention can accomplish this in a high throughputmanner. Taxonomic identification is useful for determining the presenceand identity of a pathogenic organism such as a virus, bacteria,protozoa, or multicellular parasites in a tissue sample. In mosthospitals, bacteria and other pathogenic organisms are identified basedon morphology, determination of nutritional requirements or fermentationpatterns, determination of antibiotic resistance, comparison ofisoenzyme patterns, or determination of sensitivity to bacteriophagestrains. These types of methods generally require approximately 48 to 72hours to identify the pathogenic organism. More recently, methods foridentifying pathogenic organisms have been focused on genotype analysis,for instance, using RFLPs. RFLP analysis has been performed usinghybridization methods (such as southern blots) and PCR assays.

[0177] The information generated according to the methods of theinvention and in particular the GNCs, can be included in a datastructure, for example, a database, on computer-readable medium, whereinthe information is correlated with other information pertaining to thegenomes or the subjects or types of subjects, from which the genomes areobtained. FIG. 5 shows a computer system 100 for storing andmanipulating genomic information. The computer system 100 includes agenomic database 102 which includes a plurality of records 104 a-nstoring information corresponding to a plurality of genomes. Each of therecords 104 a-n may store genetic information about each genome or anRCG generated therefrom. The genomes for which information is stored inthe genomic database 102 may be any kind of genomes from any type ofsubject. For example, the genomes may represent distinct genomes ofindividual members of a species, particular classes of the individuals,ie., army, prisoners, etc.

[0178] An example of the format of a record 200 in the genomic database102 (i.e., one of the records 104 a-n) is shown in FIG. 6A. As shown inFIG. 6A, the record 200 includes a genome identifier (Genome ID) 202that identifies the genome corresponding to the record 200. If enoughpolymorphisms of the genome were analyzed to generate the spectralpattern (such that the possibility that the GNC uniquely identifies thegenome is high), or if a group to which the genome belongs has fewenough members, than the GNC of the genome could serve as the Genome ID202. The record 202 also may include genomic information fields 204 a-n.The genomic information may be any information associated with thegenome identified by the Genome ID 202 such as, for example, a GNC, aportion of a GNC, the presence or absence of a particular SNP, a geneticattribute (genotype), a physical attribute (phenotype), a name, ataxonomic identifier, a classification of the genome, a description ofthe individual from which the genome was taken, a disease of theindividual, a mutation, a color, etc. Each information field 204 a-n maybe used as an entry in an index data structure that has a structuresimilar to record 200. For example, each entry of the index datastructure may include an indexed information field as a first dataelement, and one or more Genome IDs 202 as additional elements, suchthat all elements that share a common attribute are stored in a commondata structure. The format of the record 200 shown in FIG. 6A is merelyan example of a format that may be used to represent genomes in thegenomic database 102. The amount of information stored for each record200, the number of records 200, and the number of fields indexed mayvary.

[0179] Further, each information field 204 a-n may include one or morefields itself, and each of these fields themselves may include morefields, etc. Referring to FIG. 6B, an embodiment of the informationfield 204 a is shown. The information field 204 a includes a pluralityof fields 206 a-m for storing more information about the informationrepresented by information field 204 a. Although the followingdescription refers to the fields 206 a-m of the gene ID 204 a, suchdescription is equally applicable to information fields 204 b-n. Forexample, if information field 204 a represented a GNC of the genomecorresponding to the genome ID 202, then each of the fields 206 a-m mayrepresent a portion of the GNC, a particular SNP of the genomic patternfrom which the GNC was generated, a group of such SNPs, a description ofthe GNC, a description of a one of the SNPs, etc.

[0180] The fields 206 a-m of the gene ID 204 a may store any kind ofvalue that is capable of being stored in a computer readable medium suchas, for example, a binary value, a hexadecimal value, an integraldecimal value, or a floating point value.

[0181] A user may perform a query on the genomic database 102 to searchfor genomic information of interest, for example, all genomes having aGNC that matches the GNC of a murder suspect. In another example, it maybe known that a biological sample contains a particular sequence. Thatsequence can be compared with sequences in the database to identifyinformation such as which individual the sample was isolated from, orwhether the genetic sequence corresponds to a particular phenotypictrait. For example, the user may search the genomic database 102 forgenetic matches to identify an individual, genotypes which correlatewith a particular phenotype, genotypes associated with various classesof individuals etc. Referring to FIG. 5, a user may provide user input106 indicating genomic information for which to search to a query userinterface 108. The user input 106 may, for example, indicate an SNP forwhich to search using a standard character-based notation. The queryuser interface 108 may, for example, provide a graphical user interface(GUI) which allows the user to select from a list of types of accessiblegenomic information using an input device such as a keyboard or a mouse.

[0182] The query user interface 108 generates a search query 110 basedon the user input 106. A search engine 112 receives the search query 110and generates a mask 114 based on the search query. Example formats ofthe mask 114 and ways in which the mask 114 may be used to determinewhether the genomic information specified by the mask 114 matchesgenomic information of genomes in the genomic database 102 are describedin more detail below with respect to FIG. 7. The search engine 112determines whether the genomic information specified by the mask 114matches genomic information of genomes stored in the genomic database102. As a result of the search, the search engine 112 generates searchresults 116 indicating whether the genomic database 102 includes genomeshaving the genomic information specified by the mask 114. The searchresults 116 may also indicate which genomes in the genomic database 102have the genomic information specified by the mask 114.

[0183] If, for example, the user input 106 specified a sequence of agene, a GNC, or an SNP, the search results 116 may indicate whichgenomes in the genomic database 102 include the specified sequence, GNC,or SNP. If the user input 106 specified particular genetic informationconcerning a genome (e.g., enough to identify an individual), the searchresults 116 may indicate which individual genome listed in the genomicdatabase 102 matches the particular information, thus identifying theindividual from whom the sample was taken. Similarly, if the user input106 specified genetic sequences which are not adequate to specificallyidentify the individual, the search results 116 may still be adequate toidentify a class of individuals that have genomes in the genomicdatabase 102 that match the genetic sequence. For example, the searchresults may indicate that the genomic information of genomes of allCaucasian males matches the specified genetic sequence.

[0184]FIG. 7 illustrates a process 300 that may be used by the searchengine 112 to generate the search results 116. The search engine 112receives the search query 110 from the query user interface 108 (step302). The search engine 112 generates the mask 114 generated based onthe search query 110 (step 304). The search engine 112 performs a binaryoperation on one or more of the records 104 a-n in the genomic database102 using the mask 114 (step 306). The search engine 112 generates thesearch results 116 based on the results of the binary operationperformed in step 306 (step 308).

[0185] A computer system for implementing the system 100 of FIG. 5 as acomputer program typically includes a main unit connected to both anoutput device which displays information to a user and an input devicewhich receives input from a user. The main unit generally includes aprocessor connected to a memory system via an interconnection mechanism.The input device and output device also are connected to the processorand memory system via the interconnection mechanism.

[0186] One or more output devices may be connected to the computersystem. Example output devices include a cathode ray tube (CRT) display,liquid crystal displays (LCD), printers, communication devices such as amodem, and audio output. One or more input devices may be connected tothe computer system. Example input devices include a keyboard, keypad,track ball, mouse, pen and tablet communication device, and data inputdevices such as sensors. The invention is not limited to the particularinput or output devices used in combination with the computer system orto those described herein.

[0187] The computer system may be a general purpose computer systemwhich is programmable using a computer programming language, such as forexample, C++, Java, or other language, such as a scripting language orassembly language. The computer system may also include speciallyprogrammed, special purpose hardware such as, for example, anapplication-specific integrated circuit (ASIC). In a general purposecomputer system, the processor is typically a commercially availableprocessor, of which the series x86, Celeron, and Pentium processors,available from Intel, and similar devices from AMD and Cyrix, the 680×0series microprocessors available from Motorola, the PowerPCmicroprocessor from IBM and the Alpha-series processors from DigitalEquipment Corporation, are examples. Many other processors areavailable. Such a microprocessor executes a program called an operatingsystem, of which Windows NT, Linux, UNIX, DOS, VMS and OS8 are examples,which controls the execution of other computer programs and providesscheduling, debugging, input/output control, accounting, compilation,storage assignment, data management and memory management, andcommunication control and related services. The processor and operatingsystem define a computer platform for which application programs inhigh-level programming languages are written.

[0188] A memory system typically includes a computer readable andwriteable nonvolatile recording medium, of which a magnetic disk, aflash memory, and tape are examples. The disk may be removable such as,for example, a floppy disk or a read/write CD, or permanent, known as ahard drive. A disk has a number of tracks in which signals are stored,typically in binary form, i.e., a form interpreted as a sequence of oneand zeros. Such signals may define an application program to be executedby the microprocessor, or information stored on the disk to be processedby the application program. Typically, in operation, the processorcauses data to be read from the nonvolatile recording medium into anintegrated circuit memory element, which is typically a volatile, randomaccess memory such as a dynamic random access memory (DRAM) or staticmemory (SRAM). The integrated circuit memory element allows for fasteraccess to the information by the processor than does the disk. Theprocessor generally manipulates the data within the integrated circuitmemory and then copies the data to the disk after processing iscompleted. A variety of mechanisms are known for managing data movementbetween the disk and the integrated circuit memory element, and theinvention is not limited to any particular mechanism. It should also beunderstood that the invention is not limited to a particular memorysystem.

[0189] The invention is not limited to a particular computer platform,particular processor, or particular high-level programming language.Additionally, the computer system may be a multiprocessor computersystem or may include multiple computers connected over a computernetwork. It should be understood that each module (e.g. 108, 112) inFIG. 5 may be a separate module of a computer program, or may be aseparate computer program. Such modules may be operable on separatecomputers. Data (e.g. 102, 106, 110, 114, and 116) may be stored in amemory system or transmitted between computer systems. The invention isnot limited to any particular implementation using software, hardware,firmware, or any combination thereof. The various elements of thesystem, either individually or in combination, may be implemented as acomputer program product tangibly embodied in a machine-readable storagedevice for execution by a computer processor. Various steps of theprocess, for example, steps 302, 304, 306, and 308 of FIG. 7, may beperformed by a computer processor executing a program tangibly embodiedon a computer-readable medium to perform functions by operating on inputand generating output. Computer programming languages suitable forimplementing such a system include procedural programming languages,object-oriented programming languages, and combinations of the two.

[0190] The invention also encompasses compositions. One composition ofthe invention is a plurality of RCGs immobilized on a surface, where theplurality of RCGs are prepared by DOP-PCR. Another composition is apanel of SNP-ASOs immobilized on a surface, wherein the SNPs areidentified by using RCGs as described above.

[0191] The invention also includes kits having a container housing a setof PCR primers for reducing the complexity of a genome and a containerhousing a set of SNP-ASOs, particularly wherein the SNPs are presentwith a frequency of at least 50 or 55% in a RCG made using the primerset. In some kits, the set of PCR primers are primers for DOP-PCR andpreferably the DOP-PCR primer has the tag-(N)_(x)-TARGET structuredescribed herein, i.e., wherein the TARGET includes at least 7arbitrarily selected nucleotide residues, wherein x is an integer from 3to 9, and wherein each N is any nucleotide residue and wherein tag is apolynucleotide as described above. In some embodiments the SNPs in thekit are attached to a surface such as a slide.

[0192] SNPs identified according to the methods of the invention usingthe B1 5′ rev primer include the following: locus ASO Allele Strain (SEQID#) 1 tttatgAaggCataaaaa A 129/ 14 tttatgGaggCataaaaa B BS-DBA 15tttatgAaggTataaaaa C Spre 16 2 ctgggctgTattcattt A 129-DBA 17ctgggctgCattcattt B B6 18 tctGcctccTGagtgct C B6-129-DBA 19tctAcctccCAagtgct D Spre 20 3 tagctagaAtcaagctt A BG 21tagctagaGtcaagctt B DBA-Spre 22 4 gctgtgcAACaaatcac A 129/ 23cagctgtgc---aaatcacc B B6 24 5 tttcgtga-tgtttctat A 129-Spre 25tttcgtgaAtgtttcta B BG-DBA 26 6 cactgtctAcatcttta A B6-129 27cactgtctCcatcttta B DBA-Spre 28 7 taacattcTtgaagcca A 129-DBA-Spre 29taacattcCtgaagcca B B6 30 8 gcttccaTttcctaagg A 129-DBA 31gcttccaCttcctaagg B B6 32 9 aggaatgGcAataatcc A B6-129 33aggaatgGcGataatcc B DBA 34 aggaatgAcAataatcc C Spre 35 ttaaattcGtaaatggaD BG-129-DBA 36 ttaaattcAtaaatgga E Spre 37 10 taacattcTtgaagcca A129-DBA-Spre 38 taacattcCtgaagcca B B6 39 11 ttcTGtgActccaCttg A 129 40ttcTGtgActccaTttg B BG-DBA 41 ttcCCtgTctccaTttg C Spre 42 12gtagtttgCcaggaacc A 129-Spre 43 gtagtttgTcaggaacc B BG-DBA 44 13tgctactcctctctactcg A 129 45 tgctattcctctctgctcg B BG-DBA-Spre 46cttgatcaccctctgatga C BS-129--DBA 47 cttggtcaccctctaatga D Spre 48 14gaggtggtgcagagtga A 129-DBA 49 gaggtggcgcagagtga B B6 50gaggtggcccagagtga C Spre 51 15 cccactgaaccgcacag A 129-DBA 52cccactgagctgcacag B B6 53 cccactcagccgcacag C Spre 54 16tgaagacacagccagcc A 129-DBA 55 tgaagacgcagccagcc B B6 56tgaagacgaagccagcc C Spre 57 17 agaagttggtaccaggg A 129/FVB/F1/cast/spre58 agaagttgttaccaggg B B6 59 18 tatgattacgtaatgtt A 129/B6/F1 60tatgattatgtaatgtt B FVB/F1 61 19 atgattccagtgagtta A 129/B6 62atgattcctgtgagtta B FVB/F1 63 catactattaacactggaa C Cast-129 64catattattaacacaggaa D Spre 65 20 gtcaagaacaggcaata A 129/bG/fl/FVB 66gtcaagaataggcaata B f1 67 cagactagggaaccttc C 129 68 cagacgagggaaccttc ESpre 69 cagactagggagccttc D Cast 70 21 tgtccagttgtttgcat A 129/ 71tgtccagtcgtttgcat B b6/fvb/f1 72 ggggtagccagtttggt C Cast-129 73ggggtagcaagtttggt D Spre 74 22 caggaagctgtagctcc A 129/f1 75caggaagccgtagctcc B bE/fvb 76 cctgagcctgtctacct C Cast-129 77cctgagcccgtctacct D Spre 78 23 taacattcttgaagcca A 129/FVB/F1/cast/spre79 taacattcctgaagcca B B6 80 24 ccaactgaaccgcacag A 129/FVB 81ccaactgagctgcacag B B6 82 gagctagctcacacattct C Cast-129 83gagttagctcacacgttct D Spre 84 25 acgggggggtggcgtta A 129/f1 85acggggggtggcgttaa B bG/fvb/cast/spre 86 tagacagccagcgcgtcac C Cast-12987 tagatagccagcgcatcac D Spre 88 26 gcttttcttgagagtggc A 129/b6 89gcttttctttagagtggc B fvb 90 gcttttcgtgagagtggc C f1 91 27ctacagataaagttata A 129/bS/fvb/f1 92 ctacagatgaagttata B f1 93tagacctgctgctatct C Cast-129 94 tagacctgttgctatct D Spre 95 28tgttgttctggcctcca A 129/F1 96 tgttgttttggcctcca B B6 97ttctgagaatttgttag C 129/B6 98 ttctgagagtttgttag D F1/spre 99 29caggaagcagtagctcc A 129 100 caggaagccgtagctcc B BG/FVB/F1 101agagtcaggtaagttgc C Cast-129 102 agagtcagataagttgc D Spre 103 30agatttcaaaaagtttt A 129/b6 104 agattccaaaaggtttt B f1 105agatttcaaaaagtttt C fvb 106 cctgaggggagcaatca D Cast-129 107cctgagggaagcaatca E Spre 108 31 aaggtaagataactaag A 129.f1 109aaggtaaggtaactaag B b6/fvbn 110 ggactacacagagaaac C Cast-129 111ggactacatagagaaac D Spre 112 32 cccaggctacacgaggg A 129/fVb/f1 113cccaggctacatgaggg B b6 114 cttaccagttgtgagac C 129 115 cttaccacttgtgagacD Spre 116 cttaccagtcgtgagac E Cast 117 33 ctgccctcaggtcttta A 129 118ctgccctccggtcttta B b6/fvbn 119 gcaataaaattgtttta C Cast-129 120gcaatgagatcgtttta D Spre 121 34 tgttctgtggagacccc A129/fvbn/f1/cast/spre 122 tgttctgtagagacccc B b6 123 35cacattgaatcaaagcc A 129/bG/fvbn/f1 124 cacattgagtcaaagcc B f1 125ggactacccacccgttc C 129 126 gcgactgc--acccattct E Spre 127gcgactgccccc--attct D Cast 128 36 cctgggccagccaggaa A 129/b6/cast 129cctgggcctgccaggaa B fvbn/f1/spre 130 37 ccccaggtaaccatctt A 129/f1 131ccccaggtgaccatctt B b6/fvbn/cast/spre 132 ttctgtatattagctga C Cast-129133 tttctatattaa--ctgac D Spre 134 38 ggacccggacggtcttc A 129/b6 135ggacccggtcggtcttc B bvb/f1 136 gtccctaatgttagcat C Cast-129 137gtccccaatgtcagcat D Spre 138 39 acgggggggtggcgtta A 129/f1 139acgggggg-tggcgttaa B b6/fvbn/cast/spre 140 tagacagccagcgcgtcac C Cast141 tagatagccagcgcatcac D Spre 142 40 gattcttcgtgttcctt A 129-b6-F1 143gattcttcatgttcctt B FVBN-Cast-Spre 144 41 tgtaaaaacttagaata A 129/b6/f1145 tgtaaaaatttagaata B fvbn/cast/spre 146 42 tgtgaaagcgctcccaa A129/fvbn/f1/cast/spre 147 tgtgaaagtgctcccaa B b6 148 43caaaggctcagagaatc A 129/b6/f1 149 caaaggcttagagaatc B fvbn 150ttaattctctccaaaca C 129/b6/fvb/f1 151 ttaaggctctccggaca D f1 152 44ctgccaccgtgcacaca A 129/b6 153 ctgccaccatgcacaca B fvbn/f1 154ccaaatattctgattcc C 129-Spre 155 ccaaatattcttttttt D Cast 156 45atgagctgaccctccct A 129/BG/F1 157 atgagctgcccctccct B FVB 158acactaggtaaaagctc C 129/BS/FVB/F1 159 acactaggcaaaagctc D F1 160agacaccacgaccgagg E 129-Spre 161 agacaccaagaccgagg F Cast 162 46gcagcgtccggttaagt A 129/f1 163 gcagcgtctggttaagt B bG/fvbn/f1 164cagatactacaaggatg C 129 165 tacagatac---aaggatgc D SPRE/Cast 166 47tcagctagtgtatctgt A 129/FVB/F1 167 tcacctagtgtatttgt B B6/F1 168ttttttatttttggatt C 129-Cast 169 tttt-aatttttggattt D Spre 170 48gatattgttttcattta A 129/ 171 gatattgtcttcattta B b6/fvbn/f1 172 49agacccggtgctggtgt A 129/b6 173 agacccggcgctggtgt B fvbn/f1/cast 174 50cttctaagctttgtctt A 129/fvb/f1/cast/spre 175 cttctaagttttgtctt B b6/f1176 51 agttggcaaccagcatg A 129/ 177 agttggcatccagcatg B b6/fvbn/f1 178ggtgaaatggtaattac C 129-Cast 179 ggtgaaatagtaattac D Spre 180 52acgggatataacgagtt A 129/FVB/F1 181 acgggatacaacgagtt B BG/cast/spre 182gggatacaacgagtttc C 129-Cast 183 gggatacaccgagtttc D Spre 184 53gtatcttgggtgtcctg A 129/FVB/F1 185 gtaacttgggtgttctg B B6/F1/spre 186gggtgtcctgccccatc C 129 187 gggtgttctgttttatc D Spre 188 54tgtccagttgttttgca A 129 189 tgtccagtcgttttgca B B6/FVB/F1/spre 190aagacagccggaactct C 129 191 aagacagcaggaactct D Spre 192 55tgataggaccaaagaga A 129/b6/f1 193 cgataggactaaagaga B fvbn/f1 194tccaaagccagggccca C 129 195 tccaaattcagggccca D Spre 196 56cctgggccagccagaag A 129/B6/cast 197 cctgggcctgccagaag B FVB/F1/spre 19857 gattctctgagcctttg A 129/b6/f1 199 gattctctaagcctttg B fvbn 200taccattttttagatga C 129 201 taccatttcttagatga D Spre 202 58ctggaagggcagtgaat A 129 203 tctggacgagggtgaat B B6/FVB 204 59tagttgcagcacaaatg A 129/B6 205 tagttgtagcacaaatg B FVB/F1 206 60acactaccgcacagagc A 129/b6/fvbn/f1 207 acactaccacacagagc B f1 208aataataagtaaataag C 129/ 209 aataataaataaataag D cast 210 61tggcagtagttgttcat A 129/b6 211 tggcagtaattgttcat B fvbn/f1 212aggtatgacgtcataag C 129-Cast 213 aggtatgatgtcataag D Spre 214 62gttgttgttgaagattt A 129/fvbf1/f1 215 ttgttgttg---aagattta B b6/f1 216gatagtacaggtgttgtca C 129... 217 gatggtacaggtgtcgtca D Spre 218 63aatataatgtaacagga A 129/F1 219 aatataatataacagga B BS/FVB/F1 220 64ttaaccatttatctgat A 129/FVB 221 ttaaccatatatctgat B B6/F2 222 65agagcccagcaaagttc A 129/B6 223 agagcccaacaaagttc B FVB/F1 224atcccgaaccggggaaaat C 129-b6 225 atcccaaaccgggggaaat D cast-spre 226 66atgacaccaccacaacc A 129 227 atgacaccgccacaacc B B6/FVB/F1 228 67aggcaaacagatataac A 129/FVB/F1 229 aggcaaacggatataac B BE/cast/spre 230tgtattcactaataaga C 129-Cast 231 tgtattcattaataaga D Spre 232 68ttggcgtatacttcata A 129/BG/F1 233 ttggcgtacacttcata B FVB 234ctcaccacgctccatct C 129 235 ctcaccaccctccatct D Cast-Spre 236 69atatctaaa----ggcacag A 129/FVB 237 tatctacataaaggcac B B6/F1/cast/spre238 gtgtctcctagtctccc C B6-Cast 239 gtgtctcccagtctccc D Spre 240 70atgagctgaccctccct A 129/B6/F3 241 atgagctgcccctccct B FVB/F1 242ggacaacatttaattgg C 129-Cast 243 ggacaacacttaattgg D Spre 244 71gctttaaaatttttatt A 129 245 gctttaaattttttatt B B6/FVB/F1 246aaatttgttcctaaatg C 129 247 aaatttgtacctaaatg D Cast-Spre 248 72gtgttgttctggcctcc A 129/FVB/spre 249 gtgttgttttggcctcc B B6/F1 250 73tgaatgacaaaaagaca A 129/B6/FVB 251 tgaatgacgaaaagaca B F1/cast 252 B25′Rev ACTGAGCCATCTCWCCAG W=A+T 101 acttaacttaagctggc A 129/ 253gtacttaa-----gctggcctg B b6/fvb/f1 254 102 actctaatatcccacag A129/fvbn/f1 255 actctaatctcccacag B b6 256 cggatcggctctagttc C 129/cast257 cggatcagctctagttc D spre 258 103 tcaaaccaataaggagg A 129/b6/fvb/f1259 tcaaaccagtaaggagg B f1 260 104 gtgtgtgtgtggggggg A 129/f1 261gtgtgtgtg---gggggggt B b6/fvbn 262 cttaataataatttcat C 129/cast 263cttaataacaatttcat D spre 264 105 gtgtctccatatgtgtg A 129/b6/f1 265gtgtctacacatgtgtg B fvbn 266 106 aactcatcatgatggtt A 129/ 267aactcataatgatggtt B b6/fvbn/f1 268 aactcatcacgatggtt C cast 269atcactcatagcccaga D 129/ 270 atcacttatagcccaga F spre 271atcactcatatcccaga E cast 272 107 catcttaccagcattga A 129/cast/spre 273catcttactagcattga B b6/fvbn/f1 274 108 agtcagccggctctggc A 129/b6/f1 275agtcagccagctctggc B fvbn/f1 276 gggtaggagtggggatgag C 129/ 277gggcaggagtgggggtgag E spre 278 gggtaggagtgggggtgag D cast 279 109tcagtattgttcttctc A 129/f1/spre 280 tcagtatttttcttctc B b6/fvbn/f1/cast281 110 agcagagactgagctcg A 129/ 282 agcagagaccgagctcg B b6/fvbn/f1 283acaggggtcgattcgtc C 129/b6/fvbn/f1/cast 284 acagggatcgattcgtc E spre 285acaggggtcgtttcgtc D f1 286 111 tcccaaagcattcaagg A 129/b6/f1 287tcccaaagtattcaagg B fvbn/f2 288 gaccagggttaatgact C 129/b6 289gaccagggctaatgact D cast/spre 290 112 ctattaacagagtcgag A 129/b6/f1 300ctattaacggagtcgag B fvbn 301 gtgatactggatgtctg C 129/b6 302gtgataccg-atgtctgg D cast/spre 303 113 ctctctcgatagtctaa A 129/f1 304ctctctcgctagtctaa B b6/fvbn/f1/cast 305 tctctcgatagtctaat C 129/ 306tctctcgctggtctaat D cast 307 114 agatgcaaaattcttag A 129/ 308agatgcacagttcttag B b6/fvbn/f1 309 115 ggaaaatgctcaggtag A129/f1/cast/spre 310 ggaaaatgttcaggtag B b6/fvbn 311 116tctgggcagagtgCagg A 129/ 312 tctgggcagcgtgcagg B b6/fvb/f1 313 117tatggaacggttgcttc A 129/fvb 314 tatggaactgttgcttc B b6/f1 315aagcctggtacccgctg C 129/cast 316 aagcctggcacccgctg D spre 317 118cattcttctttttctga A 129/ 318 cattcttcgttttctga B b6/fvbn/f1/cast/spre319 ctgcaggcttgtctgtg C 129/CAST 320 ctgcaggtttgtctgtg D spre 321 119tgccatttcctataaca A 129/f1 322 tgccatttgctataaca B b6/fvbn 323 120ccgccacacccgctcct A 129/b6 324 ccgccacagccgctcct B fvbn/f1 325 121caaataatgctagttat A 129/b6/f1 326 caaataatgttagttat B fvbn 327 122ggatgttgacacgctac A 129/fvbn/f1 328 ggatgttgtcacgctac B b6/f1 329catgtgtc-caacgccat C 129/ 330 catgtgtcacaacgcca D cast/spre 331 123aaaggggccttaaagga A 129/fvbf1/f1 332 aaaggggctttaaagga B b6 333tgaaaagttcttttcat C 129/cast 334 tgaaaagtacttttcat D spre 335 124cctctctatgtgtgagc A 129/b6/f1 336 cctctctacgtgtgagc B fvbn 337gaagttttaggagattct-t C 129/ 338 gaagatttaggagagtctc D spre 339 125agggatgtattttgtta A 129/fvbn/f1 340 agggatgtgttttgtta B b6 341acaattcaaatgtatat C 129/cast 342 acaattcatatgtatat D spre 343 126cttgcctaacctgcaca A 129/b6/f1 344 cttgcctagcctgcaca B fvbn 345caacagc---acctcatatc C 129/b6/cast 346 acagcggtgcctcgtat D spre 347 127actcacagtgtcagggc A 129/fvbn/f1/spre 348 actcacagcgtcagggc B b6/cast 349128 ggctgctcctgtgtgtctg A 129/fvbf1/f1/cast 350 ggctcttcctgtgtgtctg B b6351 ggctgctcctgtgtttctg C spre 352 129 aatagatgcccttctga A 129/f1 353aatagatgccctcttga B b6/fvbn 354 aatcgatgcccttctga C spre 355 130ttggtctagcaggtagc A 129/fvbf1/f1 356 ttggtctaccaggtagc B b6 357agccttggctcttaaaa C 129/cast 358 agccttggttcttaaaa D spre 359 131agtctctggcgcctttg A 129/fvbn/f1/Cast/spre 360 agtctctgccgcctttg B b6 361132 tagcaggaggcacagctta A 129/ 362 aagcaggaggcacaactta B b6 363aagcaggaggcacagctta C fvb/f1/CAST 364 tagcaggaggcacagcttg D spre 365 133aggagagaccggactcc A 129/fvb/f1 366 aggagagagcggactcc B b6 367 134tacaagtcatccttcct A 129/b6/f1 368 tacaagtcgtccttcct B fvbn/f1 369atacctccctcagacaa C 129/cast 370 atacctcc-tcagacaag D spre 371 135aaacaaacaaacaaacc A 129/b6/f1/cast/spre 372 aaacaaaccaacaaacc B fvbn 373gtgcgccaccatgacca C 129/cast 374 gtgcgccatcatgacca D spre 375 136ggctttcccattagtgg A 129/ 376 ggctttcctattagtgg B b6/fvbn/f1 377ccctcacctctctctca C 129/cast 378 ccctcacccctctctca D spre 379 137aatctctcgcgttcatt A 129/fvbn/f1 380 aatctctcacgttcatt B b6 381 138aatgataccgatcctta A 129/f1 382 aatgatacagatcctta B b6/fvbn 383ataaaactgcattcgtg C 129/b6 384 ataaaactacattcgtg D cast/spre 385 B1MuschAGTTCCAGGACAGCCAGG 201 atatctccgactttgaa A 129/cast 386atatctccaactttgaa B b6/fvb/f1/spre 387 tggccctgcagagtctg C 129-Cast 388tggctctgcagag-ctgg D Spre 389 202 caatggatc---aaagatgc A 129-FVB-F1 390atggatcaacaaagatg B B6 391 gctgcctc--aaggtataa C 129/be 392ctgcctcttaaggtata D cast/spre 393 203 acctatggctcctcatc A 129/b6/f1 394acctatggttcctcatc B fvb 395 tcttctcccctgcttta C 129-Cast 396tcttctcac-tgctttag D spre 397 204 ccgc-ataaaaagctgag A FVB-F1 398ccgccataaaa-gctgag B B6-F1 399 agaatatagggtttttt C 129/cast 400tagaatacag--ttttttt D spre 401 205 agagttgctgtgcaggg A 129/b6/f1 402agagttgccgtgcaggg B fvb/cast 403 agagttgcagtgcaggg C spre 404 206taagcagtgttcttggc A 129-B6-F1 405 taagcagtattcttggc B FVBN 406tcttctcccctgcttta C 129/cast 407 tcttctcac-tgctttag D spre 408 207tttttttttattattga A 129/fvb/f1 409 tttttttt-attattgaa B b6 410tgtggtacgcacatctg C 129-Cast 411 tgtggtacacacatctg D Spre 412 208agactcttagacttctg A 129/f1 413 agactcttaggcttctg B b6/fvb/f1 414agactcataagcttctg C spre 415 agactcttaggcttctg D cast 416 209cacgtacccgaacgtga A 129-B6 417 cacgtacctgaacgtga B FVB-F1 418attacggtttgtcgtca C 129/CAST 419 attacggttggtcgtca D spre 420 210ccaagatacgaaaccag A 129/f1/cast/spre 421 ccaagatatgaaaccag B b6 422 211tgcaatgaccagcaacc A 129/b6 423 tgcaacgaccagcaacc B fvb/f1/cast 424tgtaacgaccaacaact C spre 425 212 tctaaagggaaagatgg A 129-FVB 426tctaaagg-aaagatgga B B6-F1 427 213 ctggactcatacataca A 129-FVB-F1 428ctggactcgtacataca B B6-F1-Cast/SPRE 429 agtttggtcccctggac C129/FVB/BG-F1-Cast 430 agtttggtttcctggac D Spre 431 214tatagcttcatgtaaaa A 129/fvb/f1/cast/spre432 tatagctttatgtaaaa B b6 433215 tttttttt-attattgaa A 129 434 tttttttttattattga B B6-FVB-F1 435actcattgccaatttaa C 129 436 actcattcagaatttaa D spre/CAST 437 216atgcgtaatgggggcta A 129 438 atgcgtaacgggggcta B bS/fvb/f1/cast/SPRE 439ataattgctcttttaaa C 129/b6/fvb/f1/cast 440 gtaattgctcttttaaa D spre 441217 tctgattagtgatggat A 129-F1 442 tctgatta-tgatggatt B B6 443agcagagtgtctcgtaa C 129 444 agcagagtatctcgtaa D spre/CAST 445 218gctggcagatatcggta A 129/b6/f1 446 gctggcaggtatcggta B fvb/cast 447 219aactgcaatgaccagca A 129-B6 448 aactgcaacgaccagca B FVB-F1 449gctggtcattgcagttt C 129 450 gttggtcgttacagttt D spre 451gctggtcgttgcagttt E cast 452 220 gctggcagatatcggta A 129-B6-F1 453gctggcaggtatcggta B FVB 454 atagaaagtccaccgtc C 129/cast 455atagaaagcccaccgtc D spre 456 221 ttagtgaccgtgtaaac A 129/b6/f1 457ttagtgactgtgtaaac B fvb 458 ggggaggagctttgttc C 129-Cast 459ggggaggatctttgttc D Spre 460 222 ggcctggacacaaaagc A 129/fvb/f1 461ggcctggaaacaaaagc B b6 462 cccttttctagtattgt C 129 463 cccttttccagtattgtD Cast-Spre 464 223 gaattggttttaggaat A 129-F1-Cast-Spre 465gaattggtattaggaat B B6 466 224 acccagctttccatggt A 129/f1 467acccagctctccatggt B b6/fvb/CAST 468 225 tcacgttcgggtacgtg A 129/b6/f1469 tcacgttcaggtacgtg B fvb/f1 470 tgccttccggttggcaa C 129-Cast 471tgccttccagttggcaa D spre 472 226 ttttatcatacaattgc A 129-F1 473ttttatcagacaattgc B B6-FVB-F1 474 227 atcttctcttctttgag A 129/f1 475atcttctcctctttgag B b6/fvb 476 cagtcctctgctttctC C 129-Cast 477cagtcctcagctttctc D Spre 478 228 ccaagatacgaaaccag A 129/f1/spre 479ccaagatatgaaaccag B b6 480 229 ggtattcaagggttact A 129/cast/spre 481ggtattca-gggttactg B b6/fvb 1bp del. 482 230 acctatggctcctcatc A129/b6/f1/cast 483 acctatggttcctcatc B fvb 484 231 ttttatcatacaattgc A129/f1 485 ttttatcagacaattgc B b6/fvb 486 232 aaccagggcttaagtct A 129487 aaccagggattaagtct B b6/fvb/f1 488 cagaaaaacagatatac C 129-BG-FVB-F1489 cagaaaaagagatatac D Spre 490 234 tctgagcgtgagtgctg A 129/fvb 491tctgagcgcgagtgctg B b6/f1/cast/spre 492 acctcagaagcggaggt C129-B6-FVB-F1 493 acctcggaaggggaggt D Spre 494 acctcggaagcggaggt E Cast495 235 taactcgatcgctatca A 129-BG-F1 496 taactcgcttgctatca B FVBN-Cast497 taactcgctcgctatca C Spre 498 236 gaatttctcaacttctt A 129/fvb/f1/spre499 gaatttctgaacttctt B b6/f1 500 237 caggggtccccaatttg A 129/f1/SPRE500 caggggtctccaatttg B b6/fvb 501 238 ttttgctgtgc-aggcta A 129-B6-F1502 ttttactgtgccaggct B FVB 503 gacagccctgtctcaaa C 129/cast 504agagaaaccctgtctca D spre 505 239 gcaccggtctgagcagt A 129/f1 506gcaccggtttgagcagt B b6/fvb/f1 507 ccgtgcccctgaacaat C 129-B6-FVB-F1-Cast508 ccgtgcccttgaacaat D Spre 509 240 tcacgttcgggtacgtg A 129/b6/f1 510tcacgttcaggtacgtg B fvb/f1 511 tgattcgctgggactct C 129-Cast 512tgattcgccgggactct D Spre 513 241 ttgatatccgaggcctt A 129/bE/fvb/f1 514ttgatatctgaggcctt B f1/cAST/SPRE 515 242 tccctgggccaagcata A 129/b6/fvb516 tccctgggtcaagcata B f1 517 243 ttatggctgaggatcac A 129-B6-F1-Cast518 ttatggctgcggatcat B FVB 519 ttatggcaggggatcac C Spre 520 244ctctctgcgctgaagca A 129/b6 521 ctctctgctctgaagca B fvb/f1 522agatacagagatgtgtt C 129-BE-FVB-F1 523 agatactgaggtgtgtt D Spre 524 245cgacatctggcagatgt A 129/f1 525 cgacatctagcagatgt B b6/fvb 526gtcacaaatagtatttc C 129/cast 527 gtcacaaagagtatttc D spre 528 246aaggtgtgtgcgtgtgt A 129/f1 529 aaggtgtgcgcgtgtgt B fvb 530 247agtcttttttttcctga A 129-B6-FVB 531 tagtc-tttttttt-cctgaa B F1 532 248caggctgtgggaggctt A 129/b6/f1 533 caggctgcggaaggctt B fvb 534ctgtaagtcattcaata C 129-B6-FVB-F1-Cast 535 ctgtaagtaattcaata D Spre 536249 caggggtccccaatttg A 129/f1 537 caggggtctccaatttg B b6/fvb 538 250gactcatggccgccttg A 129 539 gactcattgccgcctgg B B6-FVB-F1 540gactcctggccgcctgg C F1 541 gactcctggctgcctgg D Spre 542gactcctggccgcctgg E Cast 543 251 acagggga-ggaaggaag A 129 544acaggggaaggaaggaa B b6/fvb/f1 545 252 ttgatatagattgattc A 129/b6/f1 546ttgatatatattgattc B fvb/f1 547 atagaacagcaaagtaa C 129-B6-FVB-F1-Cast548 atagaacaacaaagtaa D Spre 549 253 aacaagcatctatggat A 129/fvb/f1 550aacaagcacctatggat B b6 551 DOP 300 gagcaggttaagcgatg A 129/ 552gagcaggtgaagcgatg B B6 553 301 ggcttccagcttgattc A 129/ 554ggcttccaacttgattc B B6 555 302 agatagggatgaatccc A 129/ 556agataggggtgaatccc B B6 557 303 tcattcaccgtttattg A 129/ 558tcattcactgtttattg B B6 559 304 ctgacatactgcttagg A 129/ 560ctgacatattgcttagg B B6 561 305 ctaggaaagcctaaatt A 129/ 562ctaggaaaacctaaatt B B6 563 306 atgtcaggattttaaga A 129/ 564atgtcagggttttaaga B B6 565 307 ggtttccaattggaaag A 129/ 566ggtttccagttggaaag B B6 567 308 cgaggagtgcaaagcga A 129/ 568cgaggagtccaaagcga B B6 569 309 tgtgtgtgtgtctgtct A 129/ 570tgtgtgtgcgtctgtct B B6 571 310 gcaagatgcagctgcat A 129/ 572gcaagatgtagctgcat B B6 573 311 gctggggctattctgta A 129/ 574gctggggccattctgta B B6 575 312 caataacggacctgcct A 129/ 576caataacgaacctgcct B B6 577 313 tagcctctctacatagg A 129/ 578tagcctctgtacatagg B B6 579

[0193] Other SNPs identified using the BJ1 DOP-PCR Primer include: SNPspresent within DOP-PCR using primer BJ1 Genotype of CEPH individuals:ASO name ASO sequence 12-01 104-01 884-01 1331-01 SEQ ID # 3A-GCATCTATAGGTTCACT GT TT TT TT 580 3A-T CATCTATATGTTCACTT 581 5A-CGCCAACAACATTGAGA GG CG GG GG 582 5A-G GCCAACAAGATTGAGAG 583 7A-CGGGTCGTGCGTCCCCC TT CT TT TT 584 7A-T GGGTCGTGTGTCCCCCT 585 9A-AATTGTCTCACATTTCT AA GG AA AA 586 9A-G ATTGTCTCGCATTTCTT 587 12A-CGGTGTGGTCGCAGPAG CC CC CT CT 588 12A-T GGTGTGGTTGCAGAAGG 589 15A-ATCATTGCCACACTTG AA GG AA GG 590 15A-G TCATTGCCGCACTTGPA 591 20A-AATCTGTCTACAATGAT AG GG AA AG 592 20A-G ATCTGTCTGCAATGATC 593 22A-AGGCTGGGCACAGTGGC AA GG AA AA 594 22A-G GGCTGGGCCCAGTGGCT 595 34A-ACAGCCTGGAGAACAAG CC CC CC AC 596 34A-C CAGCCTGGCGAACAAGT 597 39A-CTTTGACACCCGGAAGC CT CC CC CC 598 39A-T TTTGACACTCGGAAGCT 599 40A-CCTGCCTTTCATACTGC CT TT CT TT 600 40A-T CTGCCTTTTATACTGCC 601 40B-CACAATAGACGTTCCCC TT CT TT CT 602 40B-T ACAATAGATGTTCCCCG 603 41A-AGGTGTTTGATTTGTAC CC AC CC CC 604 41A-C GGTGTTTGCTTTGTACT 605 42A-ATCCAACTCAAAAAATG AT AA AT AT 606 42A-T TCCAACTCTAAAAATGT 607 44A-CGGGCCGCTCACAGTCC CC CT CC CC 608 44A-T GGGCCGCTTACAGTCCA 609 44B-CGCATGGCTCGTGGGTT CT CT TT CT 610 44B-T GCATGGCTTGTGGGTTT 611 46A-GGTTGGGAAGTGGAGCG GG TT GG TT 612 46A-T GTTGGGAATTGGAGCGG 613 50A-AAAGGGATGAGGATGTG AG AA AA AG 614 50A-G AAGGGATGGGGATGTGA 615 50B-ATCCTCGAGAGCTTTGC AG AG AA AG 616 50B-G TCCTCGAGGGCTTTGCT 617 51A-CTGACAATGCGTGCCC CT CC CC CC 618 51A-T TGACAATGTGTGCCCAA 619 53A-ATCCATGTCATAGATTT AG AA AA AA 620 53A-G TCCATGTCGTAGATTTC 621 66A-ATGGAGGACAGTGGAGGG TT TT TT AT 622 66A-T TGGAGGACTGTGGAGGG 623 69A-CACCCATTTCCTGAAAA TT CT TT TT 624 69A-T ACCCATTTTCTGAAAAT 625 71A-GCTGAGTTCGGCACTGC TT GG GG TT 626 71A-T CTGAGTTCTGCACTGCT 627 71B-GACCAGTTTGGCTCAAA GG TT TT GG 628 71B-T ACCAGTTTTGCTCAAAG 629 72A-ACCAATCAGAACGTGCA AA GG GG AA 630 72A-G CCAATCAGAGCGTGCAG 631 73A-AACCCACACAGACACTG AA AT TT AT 632 73A-T ACCCACACTGACACTGC 633 81A-CGGACAAAGCGCTGGTG TT CT CC CT 634 81A-T GGACAAAGTGCTGGTGT 635 81C-CAGCTGGTCCCCCTMCCC TT CT CC CC 636 81C-T AGCTGGTCTCCCTMCCC 637 90A-AGGTGTAGTAAGCACAG AA AA AC AA 638 90A-C GGTGTAGTCAGCACAGC 639 91A-CAGCGAACACGGGGG CC CC TT CC 640 91A-T AGCGAACATGGGGGAAA 641 98D-AGTGACAGCACCAAACT GG AG GG GG 642 98D-G GTGACAGCGCCAAACTT 643 101A-CGTCTGTTGCTGTTATT TT TT TT CT 644 101A-T GTCTGTTGTTGTTATTT 645 111A-AACCAGCATAGCCCAGA GG GG GG AG 646 111A-G ACCAGCATGGCCCAGAG 647 111B-ACGTAGGAGACAAGACC GG GG GG AG 648 111B-G CGTAGGAGGCAAGACCT 649 117A-ACTCTGCTGAATCTCCCA GG GG AG 650 117A-G CTCTGCTGGATCTCCCA 651 124A-AAAGCAAAGACTGATTC TT AT TT TT 652 124A-T AAGCAAAGTCTGATTCA 653 125A-AAGGCAGCTAGAGGGAG CC AA AC AA 654 125A-C AGGCAGCTCGAGGGAGA 655 130C-CTTCCATTCCGTTCAAT TT TT TT CC 656 130C-T TTCCATTCTGTTCAATT 657 130D-CTATTGTTACTGATTTT CT CT CT TT 658 130D-T TATTGTTATTGATTTTG 659 136A-AGAGCTTTCAGAGGCTG AA AG AG AG 660 136A-G GAGCTTTCGGAGGCTGA 661 137A-AGGGGGAAGATATGGAG GG AG AA AG 662 137A-G GGGGGAAGGTATGGAGT 663 143A-CCATGGCCTCGTGGGTT TC TC TT TC 664 143A-T CATGGCCTTGTGGGTTT 665 147B-AGGGKAGGGAGACCAGC AA AG GG GG 666 147B-G GGGKAGGGGGACCAGCT 667 147C-AGCAGTGTCAGTGTGGG TT AT AA AT 668 147C-T GCAGTGTCTGTGTGGGT 669 147D-AACACCAGCACTTTGAT AA AG GG AG 670 147D-G ACACCAGCGCTTTGATC 671 151A-ACCTTCTGCAACCACAC GG GG AG AG 672 151A-G CCTTCTGCGACCACACC 673 163A-AAAATTCGCAGGAGCCG GG AG GG GG 674 163A-G AAATTCGCGGGAGCCGA 675 164B-AAGGTCTAGACGCTCAC AG GG AG GG 676 164B-G AGGTCTAGGCGCTCACC 677 164C-AGGAGGAACACTTCAAA GG AG GG GG 678 164C-G GGAGGAACGCTTCAAAC 679 170A-ATTTGTGCTATACCTTG AA AG AG AG 680 170A-G TTTGTGCTGTACCTTGA 681 179A-CATGATGCACACACCCT CT CC TT CC 682 179A-T ATGATGCATACACCCTG 683 181B-CTATTGCTCCGCCTCCT CT TT CC TT 684 181B-T TATTGCTCTGCCTCCTC 685 181D-CCTCAGAGACTGTGTGC CG CC CC CC 686 181D-G CTCAGAGAGTGTGTGCC 687 187A-CATCTTCTGCGTCACTC CT CT CC CC 688 187A-T ATCTTCTGTGTCACTCA 689 187B-ACAGCATCTAGTAACCA AG AA GG AG 690 187B-G CAGCATCTGGTAACCAC 691 190A-CATTAGTGCCAAATACA CC CC CT CT 692 190A-T ATTAGTGCTAAATACAT 693 195B-ATGCTCCACAGCAGCCG AT TT TT TT 694 195B-T TGCTCCACTGCAGCCGT 695 196A-ATAGGGGAGAATCTGTT CC AC AC AA 696 196A-C TAGGGGAGCATCTGTTT 697

[0194] The invention also encompasses a composition comprising aplurality of RCGs immobilized on a surface, wherein the RCGs arecomposed of a plurality of DNA fragments, each DNA fragment including a(N)_(x)-TARGET polynucleotide structure as described above, i.e.,wherein the TARGET portion is identical in all of the DNA fragments ofeach RCG, the portion includes at least 7 nucleotide residues, wherein xis an integer from 0 to 9, and wherein each N is any nucleotide residue.Preferably the TARGET portion includes at least 8 nucleotides residues.

[0195] In other aspects, the invention includes a method for performingDOP-PCR. The prior art DOP-PCR technique was originally developed toamplify the entire genome in cases where DNA was in short supply. Thismethod is accomplished using a primer set wherein each primer has anarbitrarily selected six nucleotide residue portion, at its 3′ end. Thecomplexity of the resultant product is extremely high due to the shortlength and results in amplification of the genome. By increasing thelength of the arbitrarily selected of the DOP-PCR primer from 6nucleotides to 7, and preferably 8, or more nucleotide residues thecomplexity of the genome is significantly reduced.

EXAMPLES Example 1 Identification and Isolation of SNPs

[0196] High allele frequency SNPs are estimated to occur in the humangenome once every kilobase or less (Cooper et al., 1985). A method foridentifying these SNPs is illustrated in FIG. 1. As shown in FIG. 1,inter-Alu PCR was performed on genomes isolated from three unrelatedindividuals. The PCR products were cloned, and a mini library was madefor each of the 3 individuals. The library clone inserts werePCR-amplified and spotted on nylon filters. Clones were matched byhybridization into two sets of identical clones from each individual,for a total of 6 clones per matched clone set. These sets of clones weresequenced, and the sequences were compared in order to identify SNPs.This method of identifying SNPs has several advantages over the priorart PCR amplification methods. For instance, a higher quality sequenceis obtained from cloned DNA than is obtained from cycle sequencing ofPCR products. Additionally, every sequence represents a specific allele,rather than potentially representing a heterozygote. Finally, sequencingambiguities, Taq polymerase errors, and other source of sequence errorparticular to one representation of the sequence are reduced byapplication of an algorithm which requires that the same variantsequence be present in at least 2 of the 6 clones sampled.

[0197] In general, the Alu PCR method for identifying SNPs can beperformed using genomic DNA obtained from independent individuals,unrelated or related. Briefly, Alu PCR is performed which yields aproduct having an estimated complexity of approximately 100 differentsingle copy genomic DNA sequences and an average sequence length ofbetween about 500 base pairs and 1 kilobase pairs. The PCR products arecloned, and a mini library is made for each individual. Approximately800 clones are selected from each library and transferred into a 96-welldish. Filter replicas of each plate are hybridized with PCR probes fromindividual clones selected from one of the libraries in order to createa matched clone set of 6 clones, 2 from each individual. Many sets ofclones can be isolated from these libraries. The clones can be sequencedand compared to identify SNPs.

[0198] Methods

[0199] An Alu primer designated primer 8C was designed to produce an AluPCR product having a complexity of approximately 100 independentproducts. Primer 8C (having the nucleotide sequence CTT GCA GTG AGC CGAGATC; SEQ ID NO: 3) is complementary with base pairs 218-237 of the Aluconsensus sequence (Britten et al., 1994). In order to reduce thecomplexity of the product, however, the last base pair of the primer wasselected to correspond to base pair 237 of the consensus sequence, anucleotide which has been shown to be highly variable among Alusequences. Primer 8C therefore produces a product having complexitylower than that produced using Alu primers which match a segment of theAlu sequence in which there is little variation in nucleotide sequenceamong Alu family members.

[0200] Preliminary experiments were conducted to estimate the complexityof the product produced by Alu PCR reaction with primer 8C on the CEPHMega Yacs. These preliminary experiments confirmed that primer 8Cproduced a lower number of Alu PCR products than other Alu PCR primersclosely matching less variable sequences in the Alu consensus.

[0201] Three libraries of Alu PCR products were produced from inter-AluPCR reactions involving genomic DNA derived from three unrelated CEPHindividuals designated 201, 1701, and 2301. The reactions were performedat an annealing temperature of 58° C. for 32 cycles using the 8C Aluprimer. Each set of PCR reaction products was purified byphenol:chloroform extraction followed by ethanol precipitation. Theproducts were shotgun cloned into the T-vector pCR2.1 (Invitrogen);electroporated into E. Coli strain DH10B Electromaxampicillin-containing LB agar plates. 768 colonies were picked from eachof the three libraries into eight 96-well format plates containingLB+ampicillin and grown overnight. The following day, an equal volume ofglycerol was added and the plates were stored at −80° C. An initialsurvey of the picked clones indicated an average insert size of between500 base pairs and 1 kilobase pair.

[0202] To identify matching clones in each library, 1 microliter of anovernight culture made from each library plate well was subjected to PCRamplification using vector-derived primers. Amplified inserts werespotted onto Hybond™ N+filters (Amersham) using a 96-pin replicatingdevice such that each filter had 384 products present in duplicate. TheDNA was subjected to alkali denaturation by standard methods and fixedby baking at 80° C. for 2 hours. Individual inserts derived from thelibrary were radiolabeled by random hexamer priming and used as probesagainst the three libraries (6 filters per probe). Hybridization wascarried out overnight at 42° C. in buffer containing 50% formamide asdescribed in Sambrook et al. The following day, the filters were washedin 2× standard saline citrate (SSC), 0.1% SDS at room temperature for 15minutes, followed by 2 washes in 0.1×SSC, 0.1% SDS at 65° C. for 45minutes each. The filters were then exposed to Kodak X-OMAT X-ray filmovernight.

[0203] Results

[0204]FIG. 2 shows the data obtained for identification of SNPs. Theresults of the gel electrophoresis of inter-Alu PCR genomic DNA productsprepared using the 8C primer is shown in FIG. 2A. Mini libraries wereprepared from the Alu PCR genomic DNA products. Colonies were pickedfrom the libraries, and inserts were amplified. The inserts wereseparated by gel electrophoresis to demonstrate that each was a singleinsert. The gel is shown in FIG. 2B.

[0205] Once the individual amplified inserts were spotted on Hybond™N+filters, the inserts were radiolabeled by random hexamer primary andused as probes of the entire contents against the three mini libraries.One of the filters, having 2 positive or matched clones, is shown inFIG. 2C.

[0206] The results of screening 330 base pairs of genomic DNA by thematched clone method led to the identification of 6 SNPs, 4 in singlecopy DNA, 2 in the flanking Alu sequence. These observations wereconsistent with the projected rate of SNP currents of 1 high frequencySNP per 1,000 base pairs or less. The single copy SNPs identified arepresented below in Table I. TABLE I CEPH Individual 1 2 3 4 201taagtGtacaa(SEQ cccacGgagaa aattgCttccc aaattCaatgt (SEQ ID NO.5) (SEQID NO.7) (SEQ ID NO.9) ID NO. 11) taagtGtacaa cccacGgagaa aattgCttcccaaattCaatgt.. (SEQ ID NO.5) (SEQ ID NO.7) (SEQ ID NO.9) (SEQ ID NO.11)1701 taagtAtacaa cccacAgagaa aattgCttccc(SEQ aaattCaatgt.. (SEQ ID NO.6)(SEQ ID NO.8) ID NO. 9) (SEQ ID NO.11) taagtGtacaa cccacGgagaaaattgTttccc (SEQ aaattCaatgt.. (SEQ ID NO.5) (SEQ ID NO.7) ID NO.10)(SEQ ID NO.11) 2301 taagtGtacaa cccacAgagaa aattgCttccc aaattAaatgt..(SEQ ID NO.5) (SEQ ID NO.8) (SEQ ID NO.9) (SEQ ID. NO.12) taagtGtacaacccacGgagaa aattgTttccc aaattCaatgt.. (SEQ ID NO.5) (SEQ ID NO.7) (SEQID NO.10) (SEQ ID NO.11)

[0207] To verify the identities of the SNPs shown in Table I, specificprimers were synthesized which permitted amplification of each singlecopy locus. Cycle sequencing was then performed on PCR products fromeach of the three unrelated individuals, and the site of the putativeSNP was examined. In all cases, the genotype of the individual derivedby cycle sequencing was consistent with the genotype observed in thematched clone set.

Example 2 Allele-Specific Oligonucleotide Hybridization to Alu PCR SNPs

[0208] Methods

[0209] Inter-Alu PCR was performed using genomic DNA obtained from 136members of 8 CEPH families (numbers 102, 884, 1331, 1332, 1347, 1362,1413, and 1416) using the 8C Alu primer, as described above. Theproducts from these reactions were denatured by alkali treatment(10-fold addition of 0.5 M NaOH, 2.0 M NaCl, 25 mM EDTA) and dot blottedonto multiple Hybond™ N+filters (Amersham) using a 96-well dot blotapparatus (Schleicher and Schull). For each SNP, a set of twoallele-specific oligonucleotides consisting of two 17-residueoligonucleotides centered on the polymorphic nucleotide residue weresynthesized. Each filter was hybridized with 1 picomole ³²P-kinaselabeled allele-specific oligonucleotides and a 50-fold excess ofnon-labeled competitor oligonucleotide complementary to the oppositeallele (Shuber et al., 1993). Hybridizations were carried out overnightat 52° C. in 10 mL TMAC buffer 3.0 M TMAC, 0.6% SDS, 1 mM EDTA, 10 mMNaPO₄, pH 6.8, 5× Denhardt's solution, 40 micrograms/milliliter yeastRNA). Blots were washed for 20 minutes at room temperature in TMAC washbuffer (3 M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM Na₃PO₄ pH 6.8) followed by20 minutes at 52° C. (52° C.-52° C. is optimal). The blots were thenexposed to Kodak X OMAT AR X-ray film for 8-24 hours and genotypes weredetermined by the hybridization pattern.

[0210] Results

[0211] The results of the genotyping and mapping are shown in FIG. 3. Inorder to determine the map location of the SNP, the genotype datadetermined from CEPH families number 884 and 1347 were compared to theCEPH genotype database version 8.1 (HTTP:www.cephb.fr/cephdb/) bycalculating a 2 point lod score using the computer software programMultiMap version 2.0 running on a Sparc Ultra I computer. This analysisrevealed a linkage to marker D3S1292 with a lod score of 5.419 at atheta value of 0.0. To confirm this location, PCR amplification of theCCRSNP1 marker was performed on the Gene Bridge 4 radiation hybrid panel(Research Genetics). This analysis placed marker CCRSNP1 at 4.40 cR fromD3S3445 with a lod score greater than 15.0. Integrated maps from thegenetic location database (Collins et al., 1996) indicated that thelocations of the markers identified by these two independent methods areoverlapping. These results support the mapping of even low frequencypolymorphisms by two point linkage to markers previously established onCEPH families.

[0212] Of the dot blots performed on each CEPH family PCR, two familieswere informative at this SNP locus, namely families number, 884 and1347. The dot blot is shown in FIG. 3A. Lines are drawn around signalsrepresenting CEPH family 884 on the dot blot shown in FIGS. 3A and 3B.Allele-specific oligonucleotide hybridizations were performed on thefilters shown in FIGS. 3A and 3B under TMAC buffer conditions with Gallele-specific oligonucleotide (FIG. 3A) and A allele-specificoligonucleotide (FIG. 3B). The pedigree of CEPH family number 884 withgenotypes as scored from the filter shown in FIGS. 3A and 3B is shown inFIG. 3C. The DNA was not available for one individual in this pedigree,and that square is left blank. Mapping of CCRSNP1 was performed by twoindependent methods. First, genotype data from informative CEPH familiesnumbers 884 and 1347 were compared to the CEPH genotype database version8.1 by calculation of a 2 point lod score. Secondly, PCR amplificationof the CCRSNP1 marker was performed on the Gene Bridge 4 radiationhybrid panel. The highest lod scores determined by these analyses wereD3S1292 and D3S3445, respectively, as shown in FIG. 3D.

[0213] The percentage of SNPs detected using the above-described methodsis dependent on the number of chromosomes sampled, as well as the allelefrequency.

Example 3 Confirmation of SNP Identity

[0214] Allele-specific oligonucleotides are synthesized based onstandard protocols (Shuber et al., 1997). Briefly, polynucleotides of 17bases centering on the polymorphic site are synthesized for each alleleof a SNP. Hybridization with DNA dots of IRS or DOP-PCR products affixedto a membrane were performed, followed by hybridization to end labeledallele-specific oligonucleotides under TMAC buffer conditions. Theseconditions are known to equalize the contribution of AT and GC basepairs to melting temperature, thereby providing a uniform temperaturefor hybridization of allele-specific oligonucleotides independent ofnucleotide composition.

[0215] Using this methodology, genotypes of CEPH progenitors and theiroffspring are determined. The Mendelian segregation of each SNP markerconfirms its identity as a SNP marker and accrued estimate of itsrelative allele frequency, hence, its likely usefulness as a geneticmarker. Markers which yield complex segregation patterns or show verylow allele frequencies on CEPH progenitors are set aside for futureanalysis, and remaining markers are further characterized.

Example 4 Development of Detailed Information on Map Position and AlleleFrequency for Each SNP

[0216] Two complementary methods are used to establish genetic mapposition for each marker. Each marker is genotyped on a number of CEPHfamilies. The result is compared, using MultiMap (Matise et al., 1993,as described above) or other appropriate software, against the CEPHdatabase to determine by linkage the most likely position of the SNPmarker.

[0217] Allele frequencies are determined by hybridization with thestandard worldwide panel which U.S. NIH currently is making available toresearchers for standardization of allele frequency comparison.Allele-specific oligonucleotide methodology used for genetic mapping isused to determine allele frequency.

Example 5 Development of a System for Scoring Genotype Using SNPs

[0218] After the identification of a set of SNPs, automated genotypingis performed. Genomic DNA of a well-characterized set of subjects, suchas the CEPH families, is PCR-amplified using appropriate primers. TheseDNA samples serve as the substrate for system development. The DNA isspotted onto multiple glass slides for genotyping. This process can becarried out using a microarray spotting apparatus which can spot greaterthan 1,000 samples within a square centimeter area or more than 10,000samples on a typical microscope slide. Each slide is hybridized with afluorescently tagged allele-specific oligonucleotide under TMACconditions analogous to those described above. The genotype of eachindividual is determined by the presence or absence of a signal for aselected set of allele-specific oligonucleotides. A schematic of themethod is shown in FIG. 4.

[0219] PCR products are attached to the slide using any methods forattaching DNA to a surface that are known in the art. For instance, PCRproducts may be spotted onto poly-L-lysine-coated glass slides, andcrosslinked by UV irradiation prior to hybridization. A second, morepreferred method, which has been developed according to the invention,involves use of oligonucleotides having a 5′ amino group for each of thePCR reactions described above. The PCR products are spotted ontosilane-coated slides in the presence of NaOH to covalently attach theproducts to the slide. This method is advantageous because a covalentbond is formed, which produces a stable attachment to the surface.

[0220] SNP-ASO are hybridized under TMAC hybridization conditions withthe RCGs covalently conjugated to the surface. The allele-specificoligonucleotides are labeled at their 5′-ends with a fluorescent dye,(e.g., Cy3). After washing, detection of the fluorescentoligonucleotides is performed in one of two ways. Fluorescent images canbe captured using a fluorescence microscope equipped with a CCD cameraand automated stage capabilities. Alternatively, the data can beobtained using a microarray scanner (e.g. one made by GeneticMicrosystems). A microarray scanner provides image analysis which can beconverted to a digital (e.g. +/−) signal for each sample using any ofseveral available software applications (e.g., NIH image, ScanAnalyze,etc.). The high signal/noise ratio for this analysis allows for thedetermination of data in this mode to be straightforward and automated.These data, once exported, can be manipulated to conform with a formatwhich can be analyzed by any of several human genetics applications suchas CR1-MAP and LINKAGE software. Additionally, the methods may involveuse of two or more fluorescent dyes or other labels which can bespectrally differentiated to reduce the number of samples which need tobe analyzed. For instance, if four fluorescent spectrally distinct dyes,(e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used, then fourhybridization reactions can be performed in a single hybridizationmixture.

Example 6 Reduction of Genome Complexity Using IRS-PCR or DOP-PCR

[0221] The initial step of the SNP identification method and thegenotyping approach described above is to reduce the complexity ofgenomic DNA in a reproducible manner. The purpose of this step withrespect to genotying is to allow genotyping of multiple SNPs using theproducts of a single. PCR reaction. Using the IRS-PCR approach, a PCRprimer was synthesized which bears homology to a repetitive sequencepresent within the genome of the species to be analyzed (e.g., Alusequence in humans). When two repeat elements bearing the primersequence are present in a head-to-head fashion within a limited distance(approximately 2 kilobase pairs), the inter-repeat sequence can beamplified. The method has the advantage that the complexity of theresultant PCR can be controlled by how closely the nucleotide sequenceprimer chosen is to the consensus nucleotide sequence of the repeatelement (that is, the closer to the repeat consensus, the more complexthe PCR product).

[0222] In detail, a 50 microliter reaction for each sample was set up asfollows: distilled, deionized H₂O (ddH₂O) 30.75 10X PCR Buffer 5 μl (500mM KCl, 100 mM Tris-HCl pH 8.3, 15 mM MgCl₂ μM, 0.1% gelatin) 1.25 mMdNTPs 7.5 μl 20 μm Primer 8C 1.5 μl Taq polymerase (1.25 units) 0.25 μlTemplate (50 ng genomic DNA in ddH₂O) 5.0 μl 50 ul total

[0223] The PCR reaction was performed, for example, in a Perkin Elmer9600 thermal cycler under the following conditions:  1 min. 94° C. 30sec. 94° C.| 45 sec. 58° C.|32 cycles 90 sec. 72° C.| 10 min. 72° C.Hold  4° C.

[0224] An aliquot of the reaction mixture was separated on an agarosegel to confirm successful amplification.

[0225] RCGs were also performed using DOP-PCR with the following primer(CTC GAG NNN NNN AAG CGA TG) (SEQ ID NO: 4) (wherein N is anynucleotide). DOP-PCR uses a single primer which is typically composed of3 parts, herein designated tag-(N)_(x)-TARGET. The TARGET portion is apolynucleotide which comprises at least 7, and preferably at least 8,arbitrarily-selected nucleotide residues, x is an integer from 0 to 9,and N is any nucleotide residue. Tag is a polynucleotide as describedabove.

[0226] The initial rounds of DOP-PCR were performed at a lowtemperature, because the specificity of the reaction is determinedprimarily by the nucleotide sequence of the TARGET portion and the N,residues. A slow ramp time during these cycles insures that the primersdo not detach from the template prior to chain extension. Subsequentamplification rounds were carried out at a higher annealing temperaturebecause of the fact that the 5′ end of the DOP-PCR primer can alsocontribute to primer annealing.

[0227] The DOP-PCR method was performed using a reaction mixturecomprising the following ingredients: distilled deionized H₂O 24 μl 10XPCR Buffer 5 μl 1.25 mM dNTPs 8 μl 20 μM Primer DOP-BJ1 (SEQ ID No. 4)7.5 μl Taq polymerase 0.5 μl (1.25 units) Template 5 μl (50 ng genomicDNA in distilled deionized H₂O) 50 μl

[0228] The PCR reaction was performed, for example, in a Perkin Elmer9600 thermal cycler using the following reaction conditions:   1 min.94° C.   1 min. 94° C.| 1.5 min. 45° C.|5 cycles   2 min. ramp to 72°C.|   3 min. 72° C.|   1 min. 94° C.| 1.5 min. 58° C.|35 cycles   3 min.72° C.|  10 min. 72° C. Hold  4° C.

Example 7 Attachment of PCR Products to a Solid Support

[0229] Once the complexity of the genomic DNA from an individual hasbeen reduced, it can be attached to a solid support in order tofacilitate hybridization analysis. One method of attaching DNA to asolid support involves spotting PCR products onto a nylon membrane. Thisprotocol was performed as follows:

[0230] Upon completion of the PCR reaction (typically in a 50 μlreaction mixture), a 10-fold amount of denaturing solution (500 mM NaOH,2.0M NaCl, 25 mM EDTA) and a small amount (5 ul) of India Ink wereadded. Sixty microliters of product was applied to a pre-wetted Hybond™N+membrane (Amersham) using a Schleicher and Schull 96-well dot blotapparatus. The membrane was immediately removed and placed DNA side upon top of Whatmann 3MM paper saturated with 2×SSC for 2 minutes. Thefilters were air-dried and the DNA was fixed to the membrane by bakingin an 80° C. oven for 2 hours. The membranes were then used forhybridization.

[0231] Another method for attaching nucleic acids to a support involvesthe use of microarrays. This method attaches minute quantities of PCRproducts samples onto a glass slide. The number of samples that can bespotted is greater than 1000/cm², and therefore over 10,000 samples canbe analyzed simultaneously on a glass slide. To accomplish this,pre-cleaned glass slides were placed in a mixture of 80 ml dry xylene,32 ml 96% 3-glycidoxy-propyltrimethoxy silane, and 160 μl 99%N-ethyldiisopropylamin at 80° C. overnight. The slides were rinsed for 5minutes in ethylacetate and dried at 80° C. for 30 minutes. An equalvolume of 0.8 M NaOH (0.6M NaOH and 0.6-0.8M KOH also works) was addeddirectly to the PCR product (which contained a 5′ amino groupincorporated into the PCR primer) and the components were mixed. Theresulting solution was spotted onto a glass slide under humidconditions. At the earliest opportunity, the slide was placed in a humidchamber overnight at 37° C. The next day, the slide was removed from thehumid chamber and kept at 37° C. for an additional 1 hour. The slide wasincubated in an 80° C. oven for 2.5 hours, and then washed for 5 minutesin 0.1% SDS. The slide was washed for an additional 5 minutes in ddH₂Oand air dried. Attachment to the slide was monitored by OilGreenstaining (obtained from Molecular Probes), which specifically bindssingle-stranded DNA.

Example 8 Hybridization Using Allele Specific Oligonucleotides for EachSNP

[0232] In order to determine the genotype of an individual at a selectedSNP locus, we employed allele-specific oligo hybridizations. Using thismethod, 2 hybridization reactions were performed at each locus. Thefirst hybridization reaction involved a labeled (radioactive orfluorescent) SNP-ASO (typically 17 nucleotides residues) centered aroundand complementary to one allele of the SNP. To increase specificity, a20 to 50-fold excess of non-labeled SNP-ASO complementary to theopposite allele of the SNP was included in the hybridization mixture.For the second hybridization, the allele specificity of the previouslylabeled and non-labeled SNP-ASOs was reversed. Hybridization occurred inthe presence of TMAC buffer, which has the property thatoligonucleotides of the same length have the same annealing temperature.

[0233] Specifically, for analysis of each SNP, a pair of SNPallele-specific oligos (SNP-ASOs) consisting of two 17mers centeredaround the polymorphic nucleotide were synthesized. Each filter washybridized with 20 pmol ³³P-labeled kinase labeled SNP-ASO (0.66pmol/ml) and a 50-fold excess of non-labeled competitor oligonucleotidecomplementary to the other allele of the SNP. Hybridizations wasperformed overnight at 52° C. in 10 ml TMAC buffer (3.0M TMAC, 0.6% SDS,1 mM EDTA, 10 mM NaPO₄ 6.8, 5× Denhardt's solution, 40 μg/ml yeast RNA).Blots were washed for 20 minutes at room temperature in TMAC Wash Buffer(3M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM Na₃PO₄ pH 6.8) followed by 20minutes washing at 52° C. The blots were exposed to Kodak X-OMATAR X-rayfilm for 8-24 hours, and genotypes were determined by analyzing thehybridization pattern.

Example 9 Scoring the Hybridization Pattern for Each Sample to DetermineGenotype

[0234] Hybridization of SNP-ASOs (2 for each locus) to with IRS-PCR orDOP-PCR products of several individuals has been performed. The finalstep in this process is to determine if a positive or negative signalexists for each hybridization for an individual and then, based on thisinformation, determine the genotype for that particular locus.Essentially, all of the detection methods described herein can bereduced to a digital image file, for example using a microarray readeror using a phosphoimager. Presently, there are several software productswhich will overlay a grid onto the image and determine the signalstrength value at each element of the grid. These values are importedinto a spreadsheet program, like Microsoft Excel™, and simple analysisis performed to assign each signal a + or − value. Once this isaccomplished, an individual's genotype can be determined by its patternof hybridization to the SNP alleles present at a given loci.

Example 10 Genomic Analysis Using DOP-PCR

[0235] Genomic DNA isolated from approximately 40 individuals wassubjected to DOP-PCR using primer BJ1 (CTC GAG NNN NNN AAG CGA TG) (SEQID NO: 4). 100 microliter of the DOP-PCR mixture was precipitated byaddition of 10 microliters 3M sodium acetate (pH 5.2) and 110microliters of isopropanol and were stored at −20° C. for at least 1hour. The samples were spun down in a microcentrifuge for 30 minutes andthe supernatant was removed. The pellets were rinsed with 70% ethanoland spun again for 30 minutes. The supernatant was removed and thepellets were air-dried overnight at room temperature.

[0236] The pellets were then resuspended in 12 microliters of distilledwater and stored at −20° C. until denatured by the addition of 3microliter of 2N NaOH/50 mM EDTA and maintained at 37° C. for 20 minutesand then at room temperature for 15 minutes. The samples were thenspotted onto nylon coated-glass slides using a Genetic MicrosystemsGMS417 microarrayer. Upon completion of the spotting, the slides wereplaced in an 80° C. vacuum oven for 2 hours, and then stored at roomtemperature. A set of 2 allele specific SNP-ASOs consisting of two17mers centered around a polymorphic nucleotide residue weresynthesized. Each slide was prehybridized for 1 hour in Hyb Buffer (3MTMAC/0.5% SDS/1 mM EDTA/10 mM NaPO₄/5× Denhardt's solution/40 μg/mlyeast RNA) followed by hybridization with 0.66 picomoles per milliliter³³P-labeled kinase labeled SNP-ASO and a 50-fold excess ofcold-competitor SNP-ASO of the opposite allele in Hyb Buffer.Hybridizations were carried out overnight at 52° C. The slides werewashed twice for 30 minutes at room temperature in TMAC Wash Buffer (3MTMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO₄ pH 6.8) followed by 20 minutes at54° C. The slides were exposed to Kodak BioMax MR X-ray film. Theresults are shown in FIG. 8. The genotypes were determined by thehybridization patterns shown in FIG. 8 wherein loci are indicated.

[0237] The foregoing written specification is considered to besufficient to enable one skilled in the art to practice the invention.The present invention is not limited in scope by the examples provided,since the examples are intended as illustrations of various aspect ofthe invention and other functionally equivalent embodiments are withinthe scope of the invention. Various modifications of the invention inaddition to those shown and described herein will become apparent tothose skilled in the art from the foregoing description and fall withinthe scope of the appended claims. The advantages and objects of theinvention are not necessarily encompassed by each embodiment of theinvention.

[0238] All references, patents and patent publications that are recitedin this application are incorporated in their entirety herein byreference.

1 691 1 9 DNA Homo Sapiens variation (4)...(6) n = a, c, g, or t 1cagnnnctg 9 2 13 DNA Homo Sapiens 2 tttttttttt cag 13 3 19 DNA HomoSapiens 3 cttgcagtga gccgagatc 19 4 20 DNA Homo Sapiens variation(7)...(12) n = a, c, g, or t 4 ctcgagnnnn nnaagcgatg 20 5 11 DNA HomoSapiens 5 taagtgtaca a 11 6 11 DNA Homo Sapiens 6 taagtataca a 11 7 11DNA Homo Sapiens 7 cccacggaga a 11 8 11 DNA Homo Sapiens 8 cccacagaga a11 9 11 DNA Homo Sapiens 9 aattgcttcc c 11 10 11 DNA Homo Sapiens 10aattgtttcc c 11 11 11 DNA Homo Sapiens 11 aaattcaatg t 11 12 11 DNA HomoSapiens 12 aaattaaatg t 11 13 24 DNA Homo Sapiens 13 attaaaggcgtgcgccacca tgcc 24 14 18 DNA Homo Sapiens 14 tttatgaagg cataaaaa 18 1518 DNA Homo Sapiens 15 tttatggagg cataaaaa 18 16 18 DNA Homo Sapiens 16tttatgaagg tataaaaa 18 17 17 DNA Homo Sapiens 17 ctgggctgta ttcattt 1718 17 DNA Homo Sapiens 18 ctgggctgca ttcattt 17 19 17 DNA Homo Sapiens19 tctgcctcct gagtgct 17 20 17 DNA Homo Sapiens 20 tctacctccc aagtgct 1721 17 DNA Homo Sapiens 21 tagctagaat caagctt 17 22 17 DNA Homo Sapiens22 tagctagagt caagctt 17 23 17 DNA Homo Sapiens 23 gctgtgcaac aaatcac 1724 17 DNA Homo Sapiens 24 cagctgtgca aatcacc 17 25 17 DNA Homo Sapiens25 tttcgtgatg tttctat 17 26 17 DNA Homo Sapiens 26 tttcgtgaat gtttcta 1727 17 DNA Homo Sapiens 27 cactgtctac atcttta 17 28 17 DNA Homo Sapiens28 cactgtctcc atcttta 17 29 17 DNA Homo Sapiens 29 taacattctt gaagcca 1730 17 DNA Homo Sapiens 30 taacattcct gaagcca 17 31 17 DNA Homo Sapiens31 gcttccattt cctaagg 17 32 17 DNA Homo Sapiens 32 gcttccactt cctaagg 1733 17 DNA Homo Sapiens 33 aggaatggca ataatcc 17 34 17 DNA Homo Sapiens34 aggaatggcg ataatcc 17 35 17 DNA Homo Sapiens 35 aggaatgaca ataatcc 1736 17 DNA Homo Sapiens 36 ttaaattcgt aaatgga 17 37 17 DNA Homo Sapiens37 ttaaattcat aaatgga 17 38 17 DNA Homo Sapiens 38 taacattctt gaagcca 1739 17 DNA Homo Sapiens 39 taacattcct gaagcca 17 40 17 DNA Homo Sapiens40 ttctgtgact ccacttg 17 41 17 DNA Homo Sapiens 41 ttctgtgact ccatttg 1742 17 DNA Homo Sapiens 42 ttccctgtct ccatttg 17 43 17 DNA Homo Sapiens43 gtagtttgcc aggaacc 17 44 17 DNA Homo Sapiens 44 gtagtttgtc aggaacc 1745 17 DNA Homo Sapiens 45 tgctactcct ctactcg 17 46 17 DNA Homo Sapiens46 tgctattcct ctgctcg 17 47 17 DNA Homo Sapiens 47 cttgatcacc ctgatga 1748 17 DNA Homo Sapiens 48 cttggtcacc ctaatga 17 49 17 DNA Homo Sapiens49 gaggtggtgc agagtga 17 50 17 DNA Homo Sapiens 50 gaggtggcgc agagtga 1751 17 DNA Homo Sapiens 51 gaggtggccc agagtga 17 52 17 DNA Homo Sapiens52 cccactgaac cgcacag 17 53 17 DNA Homo Sapiens 53 cccactgagc tgcacag 1754 17 DNA Homo Sapiens 54 cccactcagc cgcacag 17 55 17 DNA Homo Sapiens55 tgaagacaca gccagcc 17 56 17 DNA Homo Sapiens 56 tgaagacgca gccagcc 1757 17 DNA Homo Sapiens 57 tgaagacgaa gccagcc 17 58 17 DNA Homo Sapiens58 agaagttggt accaggg 17 59 17 DNA Homo Sapiens 59 agaagttgtt accaggg 1760 17 DNA Homo Sapiens 60 tatgattacg taatgtt 17 61 17 DNA Homo Sapiens61 tatgattatg taatgtt 17 62 17 DNA Homo Sapiens 62 atgattccag tgagtta 1763 17 DNA Homo Sapiens 63 atgattcctg tgagtta 17 64 17 DNA Homo Sapiens64 catactatta actggaa 17 65 17 DNA Homo Sapiens 65 catattatta acaggaa 1766 17 DNA Homo Sapiens 66 gtcaagaaca ggcaata 17 67 17 DNA Homo Sapiens67 gtcaagaata ggcaata 17 68 17 DNA Homo Sapiens 68 cagactaggg aaccttc 1769 17 DNA Homo Sapiens 69 cagacgaggg aaccttc 17 70 17 DNA Homo Sapiens70 cagactaggg agccttc 17 71 17 DNA Homo Sapiens 71 tgtccagttg tttgcat 1772 17 DNA Homo Sapiens 72 tgtccagtcg tttgcat 17 73 17 DNA Homo Sapiens73 ggggtagcca gtttggt 17 74 17 DNA Homo Sapiens 74 ggggtagcaa gtttggt 1775 17 DNA Homo Sapiens 75 caggaagctg tagctcc 17 76 17 DNA Homo Sapiens76 caggaagccg tagctcc 17 77 17 DNA Homo Sapiens 77 cctgagcctg tctacct 1778 17 DNA Homo Sapiens 78 cctgagcccg tctacct 17 79 17 DNA Homo Sapiens79 taacattctt gaagcca 17 80 17 DNA Homo Sapiens 80 taacattcct gaagcca 1781 17 DNA Homo Sapiens 81 ccaactgaac cgcacag 17 82 17 DNA Homo Sapiens82 ccaactgagc tgcacag 17 83 17 DNA Homo Sapiens 83 gagctagctc acattct 1784 17 DNA Homo Sapiens 84 gagttagctc acgttct 17 85 17 DNA Homo Sapiens85 acgggggggt ggcgtta 17 86 17 DNA Homo Sapiens 86 acggggggtg gcgttaa 1787 17 DNA Homo Sapiens 87 tagacagcca gcgtcac 17 88 17 DNA Homo Sapiens88 tagatagcca gcatcac 17 89 18 DNA Homo Sapiens 89 gcttttcttg agagtggc18 90 18 DNA Homo Sapiens 90 gcttttcttt agagtggc 18 91 18 DNA HomoSapiens 91 gcttttcgtg agagtggc 18 92 17 DNA Homo Sapiens 92 ctacagataaagttata 17 93 17 DNA Homo Sapiens 93 ctacagatga agttata 17 94 17 DNAHomo Sapiens 94 tagacctgct gctatct 17 95 17 DNA Homo Sapiens 95tagacctgtt gctatct 17 96 17 DNA Homo Sapiens 96 tgttgttctg gcctcca 17 9717 DNA Homo Sapiens 97 tgttgttttg gcctcca 17 98 17 DNA Homo Sapiens 98ttctgagaat ttgttag 17 99 17 DNA Homo Sapiens 99 ttctgagagt ttgttag 17100 17 DNA Homo Sapiens 100 caggaagcag tagctcc 17 101 17 DNA HomoSapiens 101 caggaagccg tagctcc 17 102 17 DNA Homo Sapiens 102 agagtcaggtaagttgc 17 103 17 DNA Homo Sapiens 103 agagtcagat aagttgc 17 104 17 DNAHomo Sapiens 104 agatttcaaa aagtttt 17 105 17 DNA Homo Sapiens 105agattccaaa aggtttt 17 106 17 DNA Homo Sapiens 106 agatttcaaa aagtttt 17107 17 DNA Homo Sapiens 107 cctgagggga gcaatca 17 108 17 DNA HomoSapiens 108 cctgagggaa gcaatca 17 109 17 DNA Homo Sapiens 109 aaggtaagataactaag 17 110 17 DNA Homo Sapiens 110 aaggtaaggt aactaag 17 111 17 DNAHomo Sapiens 111 ggactacaca gagaaac 17 112 17 DNA Homo Sapiens 112ggactacata gagaaac 17 113 17 DNA Homo Sapiens 113 cccaggctac acgaggg 17114 17 DNA Homo Sapiens 114 cccaggctac atgaggg 17 115 17 DNA HomoSapiens 115 cttaccagtt gtgagac 17 116 17 DNA Homo Sapiens 116 cttaccacttgtgagac 17 117 17 DNA Homo Sapiens 117 cttaccagtc gtgagac 17 118 17 DNAHomo Sapiens 118 ctgccctcag gtcttta 17 119 17 DNA Homo Sapiens 119ctgccctccg gtcttta 17 120 17 DNA Homo Sapiens 120 gcaataaaat tgtttta 17121 17 DNA Homo Sapiens 121 gcaatgagat cgtttta 17 122 17 DNA HomoSapiens 122 tgttctgtgg agacccc 17 123 17 DNA Homo Sapiens 123 tgttctgtagagacccc 17 124 17 DNA Homo Sapiens 124 cacattgaat caaagcc 17 125 17 DNAHomo Sapiens 125 cacattgagt caaagcc 17 126 17 DNA Homo Sapiens 126ggactaccca cccgttc 17 127 17 DNA Homo Sapiens 127 gcgactgcac ccattct 17128 17 DNA Homo Sapiens 128 gcgactgccc ccattct 17 129 17 DNA HomoSapiens 129 cctgggccag ccaggaa 17 130 17 DNA Homo Sapiens 130 cctgggcctgccaggaa 17 131 17 DNA Homo Sapiens 131 ccccaggtaa ccatctt 17 132 17 DNAHomo Sapiens 132 ccccaggtga ccatctt 17 133 17 DNA Homo Sapiens 133ttctgtatat tagctga 17 134 17 DNA Homo Sapiens 134 tttctatatt aactgac 17135 17 DNA Homo Sapiens 135 ggacccggac ggtcttc 17 136 17 DNA HomoSapiens 136 ggacccggtc ggtcttc 17 137 17 DNA Homo Sapiens 137 gtccctaatgttagcat 17 138 17 DNA Homo Sapiens 138 gtccccaatg tcagcat 17 139 17 DNAHomo Sapiens 139 acgggggggt ggcgtta 17 140 17 DNA Homo Sapiens 140acggggggtg gcgttaa 17 141 17 DNA Homo Sapiens 141 tagacagcca gcgtcac 17142 17 DNA Homo Sapiens 142 tagatagcca gcatcac 17 143 17 DNA HomoSapiens 143 gattcttcgt gttcctt 17 144 17 DNA Homo Sapiens 144 gattcttcatgttcctt 17 145 17 DNA Homo Sapiens 145 tgtaaaaact tagaata 17 146 17 DNAHomo Sapiens 146 tgtaaaaatt tagaata 17 147 17 DNA Homo Sapiens 147tgtgaaagcg ctcccaa 17 148 17 DNA Homo Sapiens 148 tgtgaaagtg ctcccaa 17149 17 DNA Homo Sapiens 149 caaaggctca gagaatc 17 150 17 DNA HomoSapiens 150 caaaggctta gagaatc 17 151 17 DNA Homo Sapiens 151 ttaattctctccaaaca 17 152 17 DNA Homo Sapiens 152 ttaaggctct ccggaca 17 153 17 DNAHomo Sapiens 153 ctgccaccgt gcacaca 17 154 17 DNA Homo Sapiens 154ctgccaccat gcacaca 17 155 17 DNA Homo Sapiens 155 ccaaatattc tgattcc 17156 17 DNA Homo Sapiens 156 ccaaatattc ttttttt 17 157 17 DNA HomoSapiens 157 atgagctgac cctccct 17 158 17 DNA Homo Sapiens 158 atgagctgcccctccct 17 159 17 DNA Homo Sapiens 159 acactaggta aaagctc 17 160 17 DNAHomo Sapiens 160 acactaggca aaagctc 17 161 17 DNA Homo Sapiens 161agacaccacg accgagg 17 162 17 DNA Homo Sapiens 162 agacaccaag accgagg 17163 17 DNA Homo Sapiens 163 gcagcgtccg gttaagt 17 164 17 DNA HomoSapiens 164 gcagcgtctg gttaagt 17 165 17 DNA Homo Sapiens 165 cagatactacaaggatg 17 166 17 DNA Homo Sapiens 166 tacagataca aggatgc 17 167 17 DNAHomo Sapiens 167 tcagctagtg tatctgt 17 168 17 DNA Homo Sapiens 168tcacctagtg tatttgt 17 169 17 DNA Homo Sapiens 169 ttttttattt ttggatt 17170 17 DNA Homo Sapiens 170 ttttaatttt tggattt 17 171 17 DNA HomoSapiens 171 gatattgttt tcattta 17 172 17 DNA Homo Sapiens 172 gatattgtcttcattta 17 173 17 DNA Homo Sapiens 173 agacccggtg ctggtgt 17 174 17 DNAHomo Sapiens 174 agacccggcg ctggtgt 17 175 17 DNA Homo Sapiens 175cttctaagct ttgtctt 17 176 17 DNA Homo Sapiens 176 cttctaagtt ttgtctt 17177 17 DNA Homo Sapiens 177 agttggcaac cagcatg 17 178 17 DNA HomoSapiens 178 agttggcatc cagcatg 17 179 17 DNA Homo Sapiens 179 ggtgaaatggtaattac 17 180 17 DNA Homo Sapiens 180 ggtgaaatag taattac 17 181 17 DNAHomo Sapiens 181 acgggatata acgagtt 17 182 17 DNA Homo Sapiens 182acgggataca acgagtt 17 183 17 DNA Homo Sapiens 183 gggatacaac gagtttc 17184 17 DNA Homo Sapiens 184 gggatacacc gagtttc 17 185 17 DNA HomoSapiens 185 gtatcttggg tgtcctg 17 186 17 DNA Homo Sapiens 186 gtaacttgggtgttctg 17 187 17 DNA Homo Sapiens 187 gggtgtcctg ccccatc 17 188 17 DNAHomo Sapiens 188 gggtgttctg ttttatc 17 189 17 DNA Homo Sapiens 189tgtccagttg ttttgca 17 190 17 DNA Homo Sapiens 190 tgtccagtcg ttttgca 17191 17 DNA Homo Sapiens 191 aagacagccg gaactct 17 192 17 DNA HomoSapiens 192 aagacagcag gaactct 17 193 17 DNA Homo Sapiens 193 tgataggaccaaagaga 17 194 17 DNA Homo Sapiens 194 cgataggact aaagaga 17 195 17 DNAHomo Sapiens 195 tccaaagcca gggccca 17 196 17 DNA Homo Sapiens 196tccaaattca gggccca 17 197 17 DNA Homo Sapiens 197 cctgggccag ccagaag 17198 17 DNA Homo Sapiens 198 cctgggcctg ccagaag 17 199 17 DNA HomoSapiens 199 gattctctga gcctttg 17 200 17 DNA Homo Sapiens 200 gattctctaagcctttg 17 201 17 DNA Homo Sapiens 201 taccattttt tagatga 17 202 17 DNAHomo Sapiens 202 taccatttct tagatga 17 203 17 DNA Homo Sapiens 203ctggaagggc agtgaat 17 204 17 DNA Homo Sapiens 204 tctggacgag ggtgaat 17205 17 DNA Homo Sapiens 205 tagttgcagc acaaatg 17 206 17 DNA HomoSapiens 206 tagttgtagc acaaatg 17 207 17 DNA Homo Sapiens 207 acactaccgcacagagc 17 208 17 DNA Homo Sapiens 208 acactaccac acagagc 17 209 17 DNAHomo Sapiens 209 aataataagt aaataag 17 210 17 DNA Homo Sapiens 210aataataaat aaataag 17 211 17 DNA Homo Sapiens 211 tggcagtagt tgttcat 17212 17 DNA Homo Sapiens 212 tggcagtaat tgttcat 17 213 17 DNA HomoSapiens 213 aggtatgacg tcataag 17 214 17 DNA Homo Sapiens 214 aggtatgatgtcataag 17 215 17 DNA Homo Sapiens 215 gttgttgttg aagattt 17 216 17 DNAHomo Sapiens 216 ttgttgttga agattta 17 217 17 DNA Homo Sapiens 217gatagtacag gttgtca 17 218 17 DNA Homo Sapiens 218 gatggtacag gtcgtca 17219 17 DNA Homo Sapiens 219 aatataatgt aacagga 17 220 17 DNA HomoSapiens 220 aatataatat aacagga 17 221 17 DNA Homo Sapiens 221 ttaaccatttatctgat 17 222 17 DNA Homo Sapiens 222 ttaaccatat atctgat 17 223 17 DNAHomo Sapiens 223 agagcccagc aaagttc 17 224 17 DNA Homo Sapiens 224agagcccaac aaagttc 17 225 17 DNA Homo Sapiens 225 atcccgaacc ggaaaat 17226 17 DNA Homo Sapiens 226 atcccaaacc gggaaat 17 227 17 DNA HomoSapiens 227 atgacaccac cacaacc 17 228 17 DNA Homo Sapiens 228 atgacaccgccacaacc 17 229 17 DNA Homo Sapiens 229 aggcaaacag atataac 17 230 17 DNAHomo Sapiens 230 aggcaaacgg atataac 17 231 17 DNA Homo Sapiens 231tgtattcact aataaga 17 232 17 DNA Homo Sapiens 232 tgtattcatt aataaga 17233 17 DNA Homo Sapiens 233 ttggcgtata cttcata 17 234 17 DNA HomoSapiens 234 ttggcgtaca cttcata 17 235 17 DNA Homo Sapiens 235 ctcaccacgctccatct 17 236 17 DNA Homo Sapiens 236 ctcaccaccc tccatct 17 237 16 DNAHomo Sapiens 237 atatctaaag gcacag 16 238 17 DNA Homo Sapiens 238tatctacata aaggcac 17 239 17 DNA Homo Sapiens 239 gtgtctccta gtctccc 17240 17 DNA Homo Sapiens 240 gtgtctccca gtctccc 17 241 17 DNA HomoSapiens 241 atgagctgac cctccct 17 242 17 DNA Homo Sapiens 242 atgagctgcccctccct 17 243 17 DNA Homo Sapiens 243 ggacaacatt taattgg 17 244 17 DNAHomo Sapiens 244 ggacaacact taattgg 17 245 17 DNA Homo Sapiens 245gctttaaaat ttttatt 17 246 17 DNA Homo Sapiens 246 gctttaaatt ttttatt 17247 17 DNA Homo Sapiens 247 aaatttgttc ctaaatg 17 248 17 DNA HomoSapiens 248 aaatttgtac ctaaatg 17 249 17 DNA Homo Sapiens 249 gtgttgttctggcctcc 17 250 17 DNA Homo Sapiens 250 gtgttgtttt ggcctcc 17 251 17 DNAHomo Sapiens 251 tgaatgacaa aaagaca 17 252 17 DNA Homo Sapiens 252tgaatgacga aaagaca 17 253 18 DNA Homo Sapiens 253 actgagccat ctcwccag 18254 17 DNA Homo Sapiens 254 acttaactta agctggc 17 255 17 DNA HomoSapiens 255 gtacttaagc tggcctg 17 256 17 DNA Homo Sapiens 256 actctaatatcccacag 17 257 17 DNA Homo Sapiens 257 actctaatct cccacag 17 258 17 DNAHomo Sapiens 258 cggatcggct ctagttc 17 259 17 DNA Homo Sapiens 259cggatcagct ctagttc 17 260 17 DNA Homo Sapiens 260 tcaaaccaat aaggagg 17261 17 DNA Homo Sapiens 261 tcaaaccagt aaggagg 17 262 17 DNA HomoSapiens 262 gtgtgtgtgt ggggggg 17 263 17 DNA Homo Sapiens 263 gtgtgtgtggggggggt 17 264 17 DNA Homo Sapiens 264 cttaataata atttcat 17 265 17 DNAHomo Sapiens 265 cttaataaca atttcat 17 266 17 DNA Homo Sapiens 266gtgtctccat atgtgtg 17 267 17 DNA Homo Sapiens 267 gtgtctacac atgtgtg 17268 17 DNA Homo Sapiens 268 aactcatcat gatggtt 17 269 17 DNA HomoSapiens 269 aactcataat gatggtt 17 270 17 DNA Homo Sapiens 270 aactcatcacgatggtt 17 271 17 DNA Homo Sapiens 271 atcactcata gcccaga 17 272 17 DNAHomo Sapiens 272 atcacttata gcccaga 17 273 17 DNA Homo Sapiens 273atcactcata tcccaga 17 274 17 DNA Homo Sapiens 274 catcttacca gcattga 17275 17 DNA Homo Sapiens 275 catcttacta gcattga 17 276 17 DNA HomoSapiens 276 agtcagccgg ctctggc 17 277 17 DNA Homo Sapiens 277 agtcagccagctctggc 17 278 17 DNA Homo Sapiens 278 gggtaggagt ggatgag 17 279 17 DNAHomo Sapiens 279 gggcaggagt gggtgag 17 280 17 DNA Homo Sapiens 280gggtaggagt gggtgag 17 281 17 DNA Homo Sapiens 281 tcagtattgt tcttctc 17282 17 DNA Homo Sapiens 282 tcagtatttt tcttctc 17 283 17 DNA HomoSapiens 283 agcagagact gagctcg 17 284 17 DNA Homo Sapiens 284 agcagagaccgagctcg 17 285 17 DNA Homo Sapiens 285 acaggggtcg attcgtc 17 286 17 DNAHomo Sapiens 286 acagggatcg attcgtc 17 287 17 DNA Homo Sapiens 287acaggggtcg tttcgtc 17 288 17 DNA Homo Sapiens 288 tcccaaagca ttcaagg 17289 17 DNA Homo Sapiens 289 tcccaaagta ttcaagg 17 290 17 DNA HomoSapiens 290 gaccagggtt aatgact 17 291 17 DNA Homo Sapiens 291 gaccagggctaatgact 17 292 17 DNA Homo Sapiens 292 ctattaacag agtcgag 17 293 17 DNAHomo Sapiens 293 ctattaacgg agtcgag 17 294 17 DNA Homo Sapiens 294gtgatactgg atgtctg 17 295 17 DNA Homo Sapiens 295 gtgataccga tgtctgg 17296 17 DNA Homo Sapiens 296 ctctctcgat agtctaa 17 297 17 DNA HomoSapiens 297 ctctctcgct agtctaa 17 298 17 DNA Homo Sapiens 298 tctctcgatagtctaat 17 299 17 DNA Homo Sapiens 299 tctctcgctg gtctaat 17 300 17 DNAHomo Sapiens 300 agatgcaaaa ttcttag 17 301 17 DNA Homo Sapiens 301agatgcacag ttcttag 17 302 17 DNA Homo Sapiens 302 ggaaaatgct caggtag 17303 17 DNA Homo Sapiens 303 ggaaaatgtt caggtag 17 304 17 DNA HomoSapiens 304 tctgggcaga gtgcagg 17 305 17 DNA Homo Sapiens 305 tctgggcagcgtgcagg 17 306 17 DNA Homo Sapiens 306 tatggaacgg ttgcttc 17 307 17 DNAHomo Sapiens 307 tatggaactg ttgcttc 17 308 17 DNA Homo Sapiens 308aagcctggta cccgctg 17 309 17 DNA Homo Sapiens 309 aagcctggca cccgctg 17310 17 DNA Homo Sapiens 310 cattcttctt tttctga 17 311 17 DNA HomoSapiens 311 cattcttcgt tttctga 17 312 17 DNA Homo Sapiens 312 ctgcaggcttgtctgtg 17 313 17 DNA Homo Sapiens 313 ctgcaggttt gtctgtg 17 314 17 DNAHomo Sapiens 314 tgccatttcc tataaca 17 315 17 DNA Homo Sapiens 315tgccatttgc tataaca 17 316 17 DNA Homo Sapiens 316 ccgccacacc cgctcct 17317 17 DNA Homo Sapiens 317 ccgccacagc cgctcct 17 318 17 DNA HomoSapiens 318 caaataatgc tagttat 17 319 17 DNA Homo Sapiens 319 caaataatgttagttat 17 320 17 DNA Homo Sapiens 320 ggatgttgac acgctac 17 321 17 DNAHomo Sapiens 321 ggatgttgtc acgctac 17 322 17 DNA Homo Sapiens 322catgtgtcca acgccat 17 323 17 DNA Homo Sapiens 323 catgtgtcac aacgcca 17324 17 DNA Homo Sapiens 324 aaaggggcct taaagga 17 325 17 DNA HomoSapiens 325 aaaggggctt taaagga 17 326 17 DNA Homo Sapiens 326 tgaaaagttcttttcat 17 327 17 DNA Homo Sapiens 327 tgaaaagtac ttttcat 17 328 17 DNAHomo Sapiens 328 cctctctatg tgtgagc 17 329 17 DNA Homo Sapiens 329cctctctacg tgtgagc 17 330 17 DNA Homo Sapiens 330 gaagttttag gattctt 17331 17 DNA Homo Sapiens 331 gaagatttag gagtctc 17 332 17 DNA HomoSapiens 332 agggatgtat tttgtta 17 333 17 DNA Homo Sapiens 333 agggatgtgttttgtta 17 334 17 DNA Homo Sapiens 334 acaattcaaa tgtatat 17 335 17 DNAHomo Sapiens 335 acaattcata tgtatat 17 336 17 DNA Homo Sapiens 336cttgcctaac ctgcaca 17 337 17 DNA Homo Sapiens 337 cttgcctagc ctgcaca 17338 17 DNA Homo Sapiens 338 caacagcacc tcatatc 17 339 17 DNA HomoSapiens 339 acagcggtgc ctcgtat 17 340 17 DNA Homo Sapiens 340 actcacagtgtcagggc 17 341 17 DNA Homo Sapiens 341 actcacagcg tcagggc 17 342 17 DNAHomo Sapiens 342 ggctgctcct gtgtctg 17 343 17 DNA Homo Sapiens 343ggctcttcct gtgtctg 17 344 17 DNA Homo Sapiens 344 ggctgctcct gtttctg 17345 17 DNA Homo Sapiens 345 aatagatgcc cttctga 17 346 17 DNA HomoSapiens 346 aatagatgcc ctcttga 17 347 17 DNA Homo Sapiens 347 aatcgatgcccttctga 17 348 17 DNA Homo Sapiens 348 ttggtctagc aggtagc 17 349 17 DNAHomo Sapiens 349 ttggtctacc aggtagc 17 350 17 DNA Homo Sapiens 350agccttggct cttaaaa 17 351 17 DNA Homo Sapiens 351 agccttggtt cttaaaa 17352 17 DNA Homo Sapiens 352 agtctctggc gcctttg 17 353 17 DNA HomoSapiens 353 agtctctgcc gcctttg 17 354 17 DNA Homo Sapiens 354 tagcaggaggcagctta 17 355 17 DNA Homo Sapiens 355 aagcaggagg caactta 17 356 17 DNAHomo Sapiens 356 aagcaggagg cagctta 17 357 17 DNA Homo Sapiens 357tagcaggagg cagcttg 17 358 17 DNA Homo Sapiens 358 aggagagacc ggactcc 17359 17 DNA Homo Sapiens 359 aggagagagc ggactcc 17 360 17 DNA HomoSapiens 360 tacaagtcat ccttcct 17 361 17 DNA Homo Sapiens 361 tacaagtcgtccttcct 17 362 17 DNA Homo Sapiens 362 atacctccct cagacaa 17 363 17 DNAHomo Sapiens 363 atacctcctc agacaag 17 364 17 DNA Homo Sapiens 364aaacaaacaa acaaacc 17 365 17 DNA Homo Sapiens 365 aaacaaacca acaaacc 17366 17 DNA Homo Sapiens 366 gtgcgccacc atgacca 17 367 17 DNA HomoSapiens 367 gtgcgccatc atgacca 17 368 17 DNA Homo Sapiens 368 ggctttcccattagtgg 17 369 17 DNA Homo Sapiens 369 ggctttccta ttagtgg 17 370 17 DNAHomo Sapiens 370 ccctcacctc tctctca 17 371 17 DNA Homo Sapiens 371ccctcacccc tctctca 17 372 17 DNA Homo Sapiens 372 aatctctcgc gttcatt 17373 17 DNA Homo Sapiens 373 aatctctcac gttcatt 17 374 17 DNA HomoSapiens 374 aatgataccg atcctta 17 375 17 DNA Homo Sapiens 375 aatgatacagatcctta 17 376 17 DNA Homo Sapiens 376 ataaaactgc attcgtg 17 377 17 DNAHomo Sapiens 377 ataaaactac attcgtg 17 378 18 DNA Homo Sapiens 378agttccagga cagccagg 18 379 17 DNA Homo Sapiens 379 atatctccga ctttgaa 17380 17 DNA Homo Sapiens 380 atatctccaa ctttgaa 17 381 17 DNA HomoSapiens 381 tggccctgca gagtctg 17 382 17 DNA Homo Sapiens 382 tggctctgcagagctgg 17 383 17 DNA Homo Sapiens 383 caatggatca aagatgc 17 384 17 DNAHomo Sapiens 384 atggatcaac aaagatg 17 385 17 DNA Homo Sapiens 385gctgcctcaa ggtataa 17 386 17 DNA Homo Sapiens 386 ctgcctctta aggtata 17387 17 DNA Homo Sapiens 387 acctatggct cctcatc 17 388 17 DNA HomoSapiens 388 acctatggtt cctcatc 17 389 17 DNA Homo Sapiens 389 tcttctcccctgcttta 17 390 17 DNA Homo Sapiens 390 tcttctcact gctttag 17 391 17 DNAHomo Sapiens 391 ccgcataaaa agctgag 17 392 17 DNA Homo Sapiens 392ccgccataaa agctgag 17 393 17 DNA Homo Sapiens 393 agaatatagg gtttttt 17394 17 DNA Homo Sapiens 394 tagaatacag ttttttt 17 395 17 DNA HomoSapiens 395 agagttgctg tgcaggg 17 396 17 DNA Homo Sapiens 396 agagttgccgtgcaggg 17 397 17 DNA Homo Sapiens 397 agagttgcag tgcaggg 17 398 17 DNAHomo Sapiens 398 taagcagtgt tcttggc 17 399 17 DNA Homo Sapiens 399taagcagtat tcttggc 17 400 17 DNA Homo Sapiens 400 tcttctcccc tgcttta 17401 17 DNA Homo Sapiens 401 tcttctcact gctttag 17 402 17 DNA HomoSapiens 402 ttttttttta ttattga 17 403 17 DNA Homo Sapiens 403 ttttttttattattgaa 17 404 17 DNA Homo Sapiens 404 tgtggtacgc acatctg 17 405 17 DNAHomo Sapiens 405 tgtggtacac acatctg 17 406 17 DNA Homo Sapiens 406agactcttag acttctg 17 407 17 DNA Homo Sapiens 407 agactcttag gcttctg 17408 17 DNA Homo Sapiens 408 agactcataa gcttctg 17 409 17 DNA HomoSapiens 409 agactcttag gcttctg 17 410 17 DNA Homo Sapiens 410 cacgtacccgaacgtga 17 411 17 DNA Homo Sapiens 411 cacgtacctg aacgtga 17 412 17 DNAHomo Sapiens 412 attacggttt gtcgtca 17 413 17 DNA Homo Sapiens 413attacggttg gtcgtca 17 414 17 DNA Homo Sapiens 414 ccaagatacg aaaccag 17415 17 DNA Homo Sapiens 415 ccaagatatg aaaccag 17 416 17 DNA HomoSapiens 416 tgcaatgacc agcaacc 17 417 17 DNA Homo Sapiens 417 tgcaacgaccagcaacc 17 418 17 DNA Homo Sapiens 418 tgtaacgacc aacaact 17 419 17 DNAHomo Sapiens 419 tctaaaggga aagatgg 17 420 17 DNA Homo Sapiens 420tctaaaggaa agatgga 17 421 17 DNA Homo Sapiens 421 ctggactcat acataca 17422 17 DNA Homo Sapiens 422 ctggactcgt acataca 17 423 17 DNA HomoSapiens 423 agtttggtcc cctggac 17 424 17 DNA Homo Sapiens 424 agtttggtttcctggac 17 425 17 DNA Homo Sapiens 425 tatagcttca tgtaaaa 17 426 17 DNAHomo Sapiens 426 tatagcttta tgtaaaa 17 427 17 DNA Homo Sapiens 427ttttttttat tattgaa 17 428 17 DNA Homo Sapiens 428 ttttttttta ttattga 17429 17 DNA Homo Sapiens 429 actcattgcc aatttaa 17 430 17 DNA HomoSapiens 430 actcattcag aatttaa 17 431 17 DNA Homo Sapiens 431 atgcgtaatgggggcta 17 432 17 DNA Homo Sapiens 432 atgcgtaacg ggggcta 17 433 17 DNAHomo Sapiens 433 ataattgctc ttttaaa 17 434 17 DNA Homo Sapiens 434gtaattgctc ttttaaa 17 435 17 DNA Homo Sapiens 435 tctgattagt gatggat 17436 17 DNA Homo Sapiens 436 tctgattatg atggatt 17 437 17 DNA HomoSapiens 437 agcagagtgt ctcgtaa 17 438 17 DNA Homo Sapiens 438 agcagagtatctcgtaa 17 439 17 DNA Homo Sapiens 439 gctggcagat atcggta 17 440 17 DNAHomo Sapiens 440 gctggcaggt atcggta 17 441 17 DNA Homo Sapiens 441aactgcaatg accagca 17 442 17 DNA Homo Sapiens 442 aactgcaacg accagca 17443 17 DNA Homo Sapiens 443 gctggtcatt gcagttt 17 444 17 DNA HomoSapiens 444 gttggtcgtt acagttt 17 445 17 DNA Homo Sapiens 445 gctggtcgttgcagttt 17 446 17 DNA Homo Sapiens 446 gctggcagat atcggta 17 447 17 DNAHomo Sapiens 447 gctggcaggt atcggta 17 448 17 DNA Homo Sapiens 448atagaaagtc caccgtc 17 449 17 DNA Homo Sapiens 449 atagaaagcc caccgtc 17450 17 DNA Homo Sapiens 450 ttagtgaccg tgtaaac 17 451 17 DNA HomoSapiens 451 ttagtgactg tgtaaac 17 452 17 DNA Homo Sapiens 452 ggggaggagctttgttc 17 453 17 DNA Homo Sapiens 453 ggggaggatc tttgttc 17 454 17 DNAHomo Sapiens 454 ggcctggaca caaaagc 17 455 17 DNA Homo Sapiens 455ggcctggaaa caaaagc 17 456 17 DNA Homo Sapiens 456 cccttttcta gtattgt 17457 17 DNA Homo Sapiens 457 cccttttcca gtattgt 17 458 17 DNA HomoSapiens 458 gaattggttt taggaat 17 459 17 DNA Homo Sapiens 459 gaattggtattaggaat 17 460 17 DNA Homo Sapiens 460 acccagcttt ccatggt 17 461 17 DNAHomo Sapiens 461 acccagctct ccatggt 17 462 17 DNA Homo Sapiens 462tcacgttcgg gtacgtg 17 463 17 DNA Homo Sapiens 463 tcacgttcag gtacgtg 17464 17 DNA Homo Sapiens 464 tgccttccgg ttggcaa 17 465 17 DNA HomoSapiens 465 tgccttccag ttggcaa 17 466 17 DNA Homo Sapiens 466 ttttatcatacaattgc 17 467 17 DNA Homo Sapiens 467 ttttatcaga caattgc 17 468 17 DNAHomo Sapiens 468 atcttctctt ctttgag 17 469 17 DNA Homo Sapiens 469atcttctcct ctttgag 17 470 17 DNA Homo Sapiens 470 cagtcctctg ctttctc 17471 17 DNA Homo Sapiens 471 cagtcctcag ctttctc 17 472 17 DNA HomoSapiens 472 ccaagatacg aaaccag 17 473 17 DNA Homo Sapiens 473 ccaagatatgaaaccag 17 474 17 DNA Homo Sapiens 474 ggtattcaag ggttact 17 475 17 DNAHomo Sapiens 475 ggtattcagg gttactg 17 476 17 DNA Homo Sapiens 476acctatggct cctcatc 17 477 17 DNA Homo Sapiens 477 acctatggtt cctcatc 17478 17 DNA Homo Sapiens 478 ttttatcata caattgc 17 479 17 DNA HomoSapiens 479 ttttatcaga caattgc 17 480 17 DNA Homo Sapiens 480 aaccagggcttaagtct 17 481 17 DNA Homo Sapiens 481 aaccagggat taagtct 17 482 17 DNAHomo Sapiens 482 cagaaaaaca gatatac 17 483 17 DNA Homo Sapiens 483cagaaaaaga gatatac 17 484 17 DNA Homo Sapiens 484 tctgagcgtg agtgctg 17485 17 DNA Homo Sapiens 485 tctgagcgcg agtgctg 17 486 17 DNA HomoSapiens 486 acctcagaag cggaggt 17 487 17 DNA Homo Sapiens 487 acctcggaaggggaggt 17 488 17 DNA Homo Sapiens 488 acctcggaag cggaggt 17 489 17 DNAHomo Sapiens 489 taactcgatc gctatca 17 490 17 DNA Homo Sapiens 490taactcgctt gctatca 17 491 17 DNA Homo Sapiens 491 taactcgctc gctatca 17492 17 DNA Homo Sapiens 492 gaatttctca acttctt 17 493 17 DNA HomoSapiens 493 gaatttctga acttctt 17 494 17 DNA Homo Sapiens 494 caggggtccccaatttg 17 495 17 DNA Homo Sapiens 495 caggggtctc caatttg 17 496 17 DNAHomo Sapiens 496 ttttgctgtg caggcta 17 497 17 DNA Homo Sapiens 497ttttactgtg ccaggct 17 498 17 DNA Homo Sapiens 498 gacagccctg tctcaaa 17499 17 DNA Homo Sapiens 499 agagaaaccc tgtctca 17 500 17 DNA HomoSapiens 500 gcaccggtct gagcagt 17 501 17 DNA Homo Sapiens 501 gcaccggtttgagcagt 17 502 17 DNA Homo Sapiens 502 ccgtgcccct gaacaat 17 503 17 DNAHomo Sapiens 503 ccgtgccctt gaacaat 17 504 17 DNA Homo Sapiens 504tcacgttcgg gtacgtg 17 505 17 DNA Homo Sapiens 505 tcacgttcag gtacgtg 17506 17 DNA Homo Sapiens 506 tgattcgctg ggactct 17 507 17 DNA HomoSapiens 507 tgattcgccg ggactct 17 508 17 DNA Homo Sapiens 508 ttgatatccgaggcctt 17 509 17 DNA Homo Sapiens 509 ttgatatctg aggcctt 17 510 17 DNAHomo Sapiens 510 tccctgggcc aagcata 17 511 17 DNA Homo Sapiens 511tccctgggtc aagcata 17 512 17 DNA Homo Sapiens 512 ttatggctga ggatcac 17513 17 DNA Homo Sapiens 513 ttatggctgc ggatcat 17 514 17 DNA HomoSapiens 514 ttatggcagg ggatcac 17 515 17 DNA Homo Sapiens 515 ctctctgcgctgaagca 17 516 17 DNA Homo Sapiens 516 ctctctgctc tgaagca 17 517 17 DNAHomo Sapiens 517 agatacagag atgtgtt 17 518 17 DNA Homo Sapiens 518agatactgag gtgtgtt 17 519 17 DNA Homo Sapiens 519 cgacatctgg cagatgt 17520 17 DNA Homo Sapiens 520 cgacatctag cagatgt 17 521 17 DNA HomoSapiens 521 gtcacaaata gtatttc 17 522 17 DNA Homo Sapiens 522 gtcacaaagagtatttc 17 523 17 DNA Homo Sapiens 523 aaggtgtgtg cgtgtgt 17 524 17 DNAHomo Sapiens 524 aaggtgtgcg cgtgtgt 17 525 17 DNA Homo Sapiens 525agtctttttt ttcctga 17 526 17 DNA Homo Sapiens 526 tagtcttttt tcctgaa 17527 17 DNA Homo Sapiens 527 caggctgtgg gaggctt 17 528 17 DNA HomoSapiens 528 caggctgcgg aaggctt 17 529 17 DNA Homo Sapiens 529 ctgtaagtcattcaata 17 530 17 DNA Homo Sapiens 530 ctgtaagtaa ttcaata 17 531 17 DNAHomo Sapiens 531 caggggtccc caatttg 17 532 17 DNA Homo Sapiens 532caggggtctc caatttg 17 533 17 DNA Homo Sapiens 533 gactcatggc cgccttg 17534 17 DNA Homo Sapiens 534 gactcattgc cgcctgg 17 535 17 DNA HomoSapiens 535 gactcctggc cgcctgg 17 536 17 DNA Homo Sapiens 536 gactcctggctgcctgg 17 537 17 DNA Homo Sapiens 537 gactcctggc cgcctgg 17 538 17 DNAHomo Sapiens 538 acaggggagg aaggaag 17 539 17 DNA Homo Sapiens 539acaggggaag gaaggaa 17 540 17 DNA Homo Sapiens 540 ttgatataga ttgattc 17541 17 DNA Homo Sapiens 541 ttgatatata ttgattc 17 542 17 DNA HomoSapiens 542 atagaacagc aaagtaa 17 543 17 DNA Homo Sapiens 543 atagaacaacaaagtaa 17 544 17 DNA Homo Sapiens 544 aacaagcatc tatggat 17 545 17 DNAHomo Sapiens 545 aacaagcacc tatggat 17 546 17 DNA Homo Sapiens 546gagcaggtta agcgatg 17 547 17 DNA Homo Sapiens 547 gagcaggtga agcgatg 17548 17 DNA Homo Sapiens 548 ggcttccagc ttgattc 17 549 17 DNA HomoSapiens 549 ggcttccaac ttgattc 17 550 17 DNA Homo Sapiens 550 agatagggatgaatccc 17 551 17 DNA Homo Sapiens 551 agataggggt gaatccc 17 552 17 DNAHomo Sapiens 552 tcattcaccg tttattg 17 553 17 DNA Homo Sapiens 553tcattcactg tttattg 17 554 17 DNA Homo Sapiens 554 ctgacatact gcttagg 17555 17 DNA Homo Sapiens 555 ctgacatatt gcttagg 17 556 17 DNA HomoSapiens 556 ctaggaaagc ctaaatt 17 557 17 DNA Homo Sapiens 557 ctaggaaaacctaaatt 17 558 17 DNA Homo Sapiens 558 atgtcaggat tttaaga 17 559 17 DNAHomo Sapiens 559 atgtcagggt tttaaga 17 560 17 DNA Homo Sapiens 560ggtttccaat tggaaag 17 561 17 DNA Homo Sapiens 561 ggtttccagt tggaaag 17562 17 DNA Homo Sapiens 562 cgaggagtgc aaagcga 17 563 17 DNA HomoSapiens 563 cgaggagtcc aaagcga 17 564 17 DNA Homo Sapiens 564 tgtgtgtgtgtctgtct 17 565 17 DNA Homo Sapiens 565 tgtgtgtgcg tctgtct 17 566 17 DNAHomo Sapiens 566 gcaagatgca gctgcat 17 567 17 DNA Homo Sapiens 567gcaagatgta gctgcat 17 568 17 DNA Homo Sapiens 568 gctggggcta ttctgta 17569 17 DNA Homo Sapiens 569 gctggggcca ttctgta 17 570 17 DNA HomoSapiens 570 caataacgga cctgcct 17 571 17 DNA Homo Sapiens 571 caataacgaacctgcct 17 572 17 DNA Homo Sapiens 572 tagcctctct acatagg 17 573 17 DNAHomo Sapiens 573 tagcctctgt acatagg 17 574 17 DNA Homo Sapiens 574catctatagg ttcactt 17 575 17 DNA Homo Sapiens 575 catctatatg ttcactt 17576 17 DNA Homo Sapiens 576 gccaacaaca ttgagag 17 577 17 DNA HomoSapiens 577 gccaacaaga ttgagag 17 578 17 DNA Homo Sapiens 578 gggtcgtgcgtccccct 17 579 17 DNA Homo Sapiens 579 gggtcgtgtg tccccct 17 580 17 DNAHomo Sapiens 580 attgtctcac atttctt 17 581 17 DNA Homo Sapiens 581attgtctcgc atttctt 17 582 17 DNA Homo Sapiens 582 ggtgtggtcg cagaagg 17583 17 DNA Homo Sapiens 583 ggtgtggttg cagaagg 17 584 17 DNA HomoSapiens 584 tcattgccac acttgaa 17 585 17 DNA Homo Sapiens 585 tcattgccgcacttgaa 17 586 17 DNA Homo Sapiens 586 atctgtctac aatgatc 17 587 17 DNAHomo Sapiens 587 atctgtctgc aatgatc 17 588 17 DNA Homo Sapiens 588ggctgggcac agtggct 17 589 17 DNA Homo Sapiens 589 ggctgggcgc agtggct 17590 17 DNA Homo Sapiens 590 cagcctggag aacaagt 17 591 17 DNA HomoSapiens 591 cagcctggcg aacaagt 17 592 17 DNA Homo Sapiens 592 tttgacacccggaagct 17 593 17 DNA Homo Sapiens 593 tttgacactc ggaagct 17 594 17 DNAHomo Sapiens 594 ctgcctttca tactgcc 17 595 17 DNA Homo Sapiens 595ctgcctttta tactgcc 17 596 17 DNA Homo Sapiens 596 acaatagacg ttccccg 17597 17 DNA Homo Sapiens 597 acaatagatg ttccccg 17 598 17 DNA HomoSapiens 598 ggtgtttgat ttgtact 17 599 17 DNA Homo Sapiens 599 ggtgtttgctttgtact 17 600 17 DNA Homo Sapiens 600 tccaactcaa aaaatgt 17 601 17 DNAHomo Sapiens 601 tccaactcta aaaatgt 17 602 17 DNA Homo Sapiens 602gggccgctca cagtcca 17 603 17 DNA Homo Sapiens 603 gggccgctta cagtcca 17604 17 DNA Homo Sapiens 604 gcatggctcg tgggttt 17 605 17 DNA HomoSapiens 605 gcatggcttg tgggttt 17 606 17 DNA Homo Sapiens 606 gttgggaagtggagcgg 17 607 17 DNA Homo Sapiens 607 gttgggaatt ggagcgg 17 608 17 DNAHomo Sapiens 608 aagggatgag gatgtga 17 609 17 DNA Homo Sapiens 609aagggatggg gatgtga 17 610 17 DNA Homo Sapiens 610 tcctcgagag ctttgct 17611 17 DNA Homo Sapiens 611 tcctcgaggg ctttgct 17 612 17 DNA HomoSapiens 612 tgacaatgcg tgcccaa 17 613 17 DNA Homo Sapiens 613 tgacaatgtgtgcccaa 17 614 17 DNA Homo Sapiens 614 tccatgtcat agatttc 17 615 17 DNAHomo Sapiens 615 tccatgtcgt agatttc 17 616 17 DNA Homo Sapiens 616tggaggacag tggaggg 17 617 17 DNA Homo Sapiens 617 tggaggactg tggaggg 17618 17 DNA Homo Sapiens 618 acccatttcc tgaaaat 17 619 17 DNA HomoSapiens 619 acccattttc tgaaaat 17 620 17 DNA Homo Sapiens 620 ctgagttcggcactgct 17 621 17 DNA Homo Sapiens 621 ctgagttctg cactgct 17 622 17 DNAHomo Sapiens 622 accagtttgg ctcaaag 17 623 17 DNA Homo Sapiens 623accagttttg ctcaaag 17 624 17 DNA Homo Sapiens 624 ccaatcagaa cgtgcag 17625 17 DNA Homo Sapiens 625 ccaatcagag cgtgcag 17 626 17 DNA HomoSapiens 626 acccacacag acactgc 17 627 17 DNA Homo Sapiens 627 acccacactgacactgc 17 628 17 DNA Homo Sapiens 628 ggacaaagcg ctggtgt 17 629 17 DNAHomo Sapiens 629 ggacaaagtg ctggtgt 17 630 17 DNA Homo Sapiens 630agctggtccc cctmccc 17 631 17 DNA Homo Sapiens 631 agctggtctc cctmccc 17632 17 DNA Homo Sapiens 632 ggtgtagtaa gcacagc 17 633 17 DNA HomoSapiens 633 ggtgtagtca gcacagc 17 634 17 DNA Homo Sapiens 634 agcgaacacgggggaaa 17 635 17 DNA Homo Sapiens 635 agcgaacatg ggggaaa 17 636 17 DNAHomo Sapiens 636 gtgacagcac caaactt 17 637 17 DNA Homo Sapiens 637gtgacagcgc caaactt 17 638 17 DNA Homo Sapiens 638 gtctgttgct gttattt 17639 17 DNA Homo Sapiens 639 gtctgttgtt gttattt 17 640 17 DNA HomoSapiens 640 accagcatag cccagag 17 641 17 DNA Homo Sapiens 641 accagcatggcccagag 17 642 17 DNA Homo Sapiens 642 cgtaggagac aagacct 17 643 17 DNAHomo Sapiens 643 cgtaggaggc aagacct 17 644 17 DNA Homo Sapiens 644ctctgctgaa tctccca 17 645 17 DNA Homo Sapiens 645 ctctgctgga tctccca 17646 17 DNA Homo Sapiens 646 aagcaaagac tgattca 17 647 17 DNA HomoSapiens 647 aagcaaagtc tgattca 17 648 17 DNA Homo Sapiens 648 aggcagctagagggaga 17 649 17 DNA Homo Sapiens 649 aggcagctcg agggaga 17 650 17 DNAHomo Sapiens 650 ttccattccg ttcaatt 17 651 17 DNA Homo Sapiens 651ttccattctg ttcaatt 17 652 17 DNA Homo Sapiens 652 tattgttact gattttg 17653 17 DNA Homo Sapiens 653 tattgttatt gattttg 17 654 17 DNA HomoSapiens 654 gagctttcag aggctga 17 655 17 DNA Homo Sapiens 655 gagctttcggaggctga 17 656 17 DNA Homo Sapiens 656 gggggaagat atggagt 17 657 17 DNAHomo Sapiens 657 gggggaaggt atggagt 17 658 17 DNA Homo Sapiens 658catggcctcg tgggttt 17 659 17 DNA Homo Sapiens 659 catggccttg tgggttt 17660 17 DNA Homo Sapiens 660 gggkagggag accagct 17 661 17 DNA HomoSapiens 661 gggkaggggg accagct 17 662 17 DNA Homo Sapiens 662 gcagtgtcagtgtgggt 17 663 17 DNA Homo Sapiens 663 gcagtgtctg tgtgggt 17 664 17 DNAHomo Sapiens 664 acaccagcac tttgatc 17 665 17 DNA Homo Sapiens 665acaccagcgc tttgatc 17 666 17 DNA Homo Sapiens 666 ccttctgcaa ccacacc 17667 17 DNA Homo Sapiens 667 ccttctgcga ccacacc 17 668 17 DNA HomoSapiens 668 aaattcgcag gagccga 17 669 17 DNA Homo Sapiens 669 aaattcgcgggagccga 17 670 17 DNA Homo Sapiens 670 aggtctagac gctcacc 17 671 17 DNAHomo Sapiens 671 aggtctaggc gctcacc 17 672 17 DNA Homo Sapiens 672ggaggaacac ttcaaac 17 673 17 DNA Homo Sapiens 673 ggaggaacgc ttcaaac 17674 17 DNA Homo Sapiens 674 tttgtgctat accttga 17 675 17 DNA HomoSapiens 675 tttgtgctgt accttga 17 676 17 DNA Homo Sapiens 676 atgatgcacacaccctg 17 677 17 DNA Homo Sapiens 677 atgatgcata caccctg 17 678 17 DNAHomo Sapiens 678 tattgctccg cctcctc 17 679 17 DNA Homo Sapiens 679tattgctctg cctcctc 17 680 17 DNA Homo Sapiens 680 ctcagagact gtgtgcc 17681 17 DNA Homo Sapiens 681 ctcagagagt gtgtgcc 17 682 17 DNA HomoSapiens 682 atcttctgcg tcactca 17 683 17 DNA Homo Sapiens 683 atcttctgtgtcactca 17 684 17 DNA Homo Sapiens 684 cagcatctag taaccac 17 685 17 DNAHomo Sapiens 685 cagcatctgg taaccac 17 686 17 DNA Homo Sapiens 686attagtgcca aatacat 17 687 17 DNA Homo Sapiens 687 attagtgcta aatacat 17688 17 DNA Homo Sapiens 688 tgctccacag cagccgt 17 689 17 DNA HomoSapiens 689 tgctccactg cagccgt 17 690 17 DNA Homo Sapiens 690 taggggagaatctgttt 17 691 17 DNA Homo Sapiens 691 taggggagca tctgttt 17

We claim:
 1. A method for detecting the presence or absence of a singlenucleotide polymorphism (SNP) allele in a genomic sample, the methodcomprising: preparing a reduced complexity genoime (RCG) from thegenomic sample, and analyzing the RCG for the presence or absence of aSNP allele.
 2. The method of claim 1, wherein the analysis compriseshybridizing a SNP-ASO and the RCG, wherein the SNP-ASO is complementaryto one allele of a SNP, whereby the allele of the SNP is present in thegenomic sample if the SNP-ASO hybridizes with the RCG, and wherein thepresence or absence of the SNP is used to characterize the genomicsample.
 3. The method of claim 2, wherein the RCG is immobilized on asurface.
 4. The method of claim 2, wherein the SNP-ASO is immobilized ona surface.
 5. The method of claim 2, wherein the SNP-ASO is individuallyhybridized with a plurality of RCGs.
 6. The method of claim 1, whereinthe RCG is a PCR-derived RCG.
 7. The method of claim 1, wherein the RCGis a native RCG.
 8. The method of any one of claims 1-7, wherein themethod further comprises identifying a genotype of the genomic sample,whereby the genotype is identified by the presence or absence of thealleles of the SNP in the RCG.
 9. The method of any one of claims 1-7,wherein the genomic sample is obtained from a tumor.
 10. The method ofclaim 9, wherein a plurality of RCGs are prepared from genomic samplesisolated from a plurality of subjects and the plurality of RCGs areanalyzed for the presence of the SNP.
 11. The method of claim 8, whereinthe presence or absence of the SNP allele is analyzed in a plurality ofgenomic samples selected randomly from a population, the method furthercomprising determining the allelic frequency of the SNP allele in thepopulation by comparing the number of genomic samples in which theallele is detected and the number of genomic samples analyzed.
 12. Themethod of claim 1, wherein the RCG is prepared by performing degenerateoligonucleotide priming-polymerase chain reaction (DOP-PCR) using adegenerate oligonucleotide primer having a tag-(N)_(x)-TARGET nucleotidesequence, wherein the TARGET nucleotide sequence includes at least 7TARGET nucleotide residues, wherein x is an integer from 0-9, andwherein each N is any nucleotide residue, and wherein the tag is apolynucleotide having from about 0 to about 20 nucleotides.
 13. Themethod of claim 12, wherein the TARGET nucleotide sequence includes atleast 8 nucleotide residues.
 14. The method of claim 6, wherein the RCGis prepared by interspersed repeat sequence-polymerase chain reaction(IRS-PCR).
 15. The method of claim 6, wherein the RCG is prepared byarbitrarily primed-polymerase chain reaction (AP-PCR).
 16. The method ofclaim 6, wherein the RCG is prepared by adapter-polymerase chainreaction.
 17. The method of claim 2, wherein at least a fraction of theSNP-ASO is labeled.
 18. The method of claim 17, wherein an excess of anon-labeled SNP-ASO is added during the hybridization step, wherein thenon-labeled oligonucleotide is complementary to a different allele ofthe same SNP than the labeled SNP-ASO.
 19. The method of claim 17,further comprising performing a parallel hybridization reaction whereinthe RCG is hybridized with a labeled SNP-ASO, wherein theoligonucleotide is complementary to a different allele of the same SNPthan the labeled SNP-ASO.
 20. The method of claim 19, wherein the twoSNP-AGOs are distinguishably labeled.
 21. The method of claim 17, anexcess of non-labeled SNP-ASO is present during the hybridization. 22.The method of claim 2, wherein the SNP-ASO is composed of from about 10to about 50 nucleotides residues.
 23. The method of claim 22, whereinthe SNP-ASO is composed of from about 10 to about 25 nucleotidesresidues.
 24. The method of claim 17, wherein the label is a radioactiveisotope.
 25. The method of claim 24, further comprising the step ofexposing the RCG to a film to produce a signal on the film whichcorresponds to the radioactively labeled hybridization products if theSNP is present in the RCG.
 26. The method of claim 17, wherein the labelis a fluorescent molecule.
 27. The method of claim 26, furthercomprising the step of exposing the RCG to an automated fluorescencereader to generate an output signal which corresponds to thefluorescently labeled hybridization products if the SNP is present inthe RCG.
 28. The method of claim 17, wherein a plurality of SNP-ASOs arelabeled with fluorescent molecules, each SNP-ASO being labeled with aspectrally distinct fluorescent molecule.
 29. The method of claim 28,wherein the number of SNP-ASOs having a spectrally distinct fluorescentmolecule is at least two.
 30. The method of claim 28, wherein the numberis selected from the group consisting of three, four and eight.
 31. Themethod of claim 2, wherein a plurality of RCGs are labeled withfluorescent molecules, each RCG being labeled with a spectrally distinctfluorescent molecule, and wherein all of the RCGs having a spectrallydistinct fluorescent molecule.
 32. The method of claim 1, wherein theRCG is prepared by performing degenerate oligonucleotidepriming-polymerase chain reaction using a degenerate oligonucleotideprimer having a tag-(N)_(x)-TARGET nucleotide sequence, wherein theTARGET nucleotide sequence includes fewer than 7 TARGET nucleotideresidues wherein x is an integer from 0 to 9, wherein each N is anynucleotide residues, and wherein the tag is a polynucleotide having fromabout 0-20 nucleotides.
 33. The method of claim 32 wherein the TARGETnucleotide sequence includes at least 5 nucleotide residues.
 34. Themethod of claim 32 wherein the TARGET nucleotide sequence includes atleast 6 nucleotide residues.
 35. The method of claim 2, wherein the RCGis labeled.
 36. The method of claim 4, wherein a plurality of differentSNP-ASOs are attached to the surface.
 37. The method of claim 1, whereinthe RCG is prepared by performing multiple primed DOP-PCR.
 38. Themethod of claim 2, wherein the genomic sample is characterized bygenerating a genomic pattern based on the presence or absence of theallele of the SNP in the genomic sample.
 39. The method of claim 38,wherein the genomic pattern is a genomic classification code.
 40. Amethod for characterizing a tumor, the method comprising: isolatinggenomic DNA from tumor samples obtained from a plurality of subjects,preparing a RCGs from each genomic DNA, performing a hybridizationreaction with a SNP-ASO and the plurality of RCGs, wherein the SNP-ASOis complementary to one allele of a SNP, and characterizing the tumorbased on whether the SNP-ASO hybridizes with at least some of the RCGs,whereby if the SNP oligonucleotide hybridizes with at least some of theRCGs, then the allele of the SNP is present in the genomic DNA of thetumor.
 41. The method of claim 40, wherein the hybridization reaction isperformed with a plurality of SNP-ASOs immobilized on a surface, andwherein the hybridization is performed on the plurality of RCGS, eachRCG being analyzed separately.
 42. The method of claim 40, wherein theRCGs are prepared by performing POP-PCR using a degenerateoligonucleotide primer having a tag-(N)_(x)-TARGET nucleotide sequence,wherein the TARGET nucleotide sequence includes at least 7 TARGETnucleotide residues and wherein x¹ is an integer from 0 to 9, whereineach N is any nucleotide residue, and wherein each tag is apolynucleotide having from 0 to about 20 nucleotide residues.
 43. Themethod of claim 42, wherein the TARGET nucleotide sequence includes atleast 8 nucleotide residues.
 44. The method of claim 40, wherein theRCGs are PCR-generated RCGs.
 45. The method of claim 40, wherein theRCGs are native RCGs.
 46. The method of claim 40, wherein the RCG isprepared by performing DOP-PCR using a degenerate oligonucleotide primerhaving a tag-(N)_(x)-TARGET nucleotide sequence, wherein the TARGETnucleotide sequence includes has fewer than 7 TARGET nucleotide residuesand wherein x is an integer from 0 to 9, wherein each N is anynucleotide residue, and wherein each tag is a polynucleotide having from0 to about 20 nucleotide residues
 47. A method for generating a genomicpattern for an individual genome, the method comprising: preparing a RCGfrom the individual genome, analyzing the RCG for the presence orabsence of at least one SNP allele, and generating a genomic pattern forthe individual genome based on the presence or absence of SNP alleles.48. The method of claim 47, wherein analyzing the RCG involves ahybridizing the RCG with a panel of SNP-ASOs, each of which iscomplementary to one allele of a SNP, and identifying the genomicpattern by determining the ability of the RCG to hybridize with eachSNP-ASO.
 49. The method of claim 47, wherein the genomic pattern is agenomic classification code which is generated from the pattern of SNPalleles for each RCG.
 50. The method of claim 49, wherein the genomicclassification code is also generated using the allelic frequency of theSNPs.
 51. The method of claim 47, wherein the genomic pattern is avisual pattern.
 52. The method of claim 47, wherein the genomic patternis a digital pattern.
 53. The method of claim 48, wherein the SNP-ASOsare immobilized on a surface.
 54. The method of claim 47, furthercomprising performing a parallel reaction wherein the hybridizationreaction is performed using a panel of labeled complementary SNP-ASOs.55. The method of claim 54, wherein the RCG is immobilized on a surfaceand wherein each SNP-ASO of the panel is hybridized with a separatesurface.
 56. The method of claim 54, wherein the RCGs is immobilized ona surface and wherein a plurality of SNP-ASOs of the panel arehybridized with a single surface, each SNP-ASO being labeled with aspectrally distinct fluorescent molecule.
 57. The method of claim 47,wherein the RCGs is prepared by performing DOP-PCR using a degenerateoligonucleotide primer having a tag-(N)_(x)-TARGET nucleotide sequence,wherein the TARGET nucleotide sequence includes at least 7 TARGETnucleotide residues and wherein x is an integer from 0 to 9, whereineach N is any nucleotide residue, and wherein each tag is apolynucleotide having from 0 to about 20 nucleotide residues.
 58. Themethod of claim 47, wherein the RCG is a PCR-generated RCG.
 59. Themethod of claim 47, wherein the RCG is a native RCG.
 60. The method ofclaim 47, wherein the RCG is prepared by performing DOP-PCR using adegenerate oligonucleotide primer having a tag-(N)_(x)-TARGET nucleotidesequence, wherein the TARGET nucleotide sequence includes less than 7TARGET nucleotide residues and wherein x is an integer from 0 to 9,wherein each N is any nucleotide residue, and wherein each tag is apolynucleotide having from 0 to about 20 nucleotide residues
 61. Amethod for generating a genomic classification code for a genome, themethod comprising: preparing a RCG from the genome, analyzing the RCGfor the presence or absence of SNP alleles of known allelic frequency,and identifying a genomic pattern of SNP alleles for the RCG bydetermining the presence or absence therein of SNP alleles, andgenerating a genomic classification code for the RCG based on thepresence or absence and the allelic frequency of the SNP alleles. 62.The method of claim 61, wherein the RCG is hybridized reaction with apanel of SNP-ASOs of known allelic frequency, each of which iscomplementary to one allele of a SNP, and identifying the genomicpattern based on whether each SNP-ASO hybridizes with the RCG.
 63. Themethod of claim 62, wherein the SNP-ASOs are immobilized on a surface.64. The method of claim 62, wherein the RCG is immobilized on a surface.65. The method of claim 61, wherein the RCG is prepared by performingPOP-PCR using a degenerate oligonucleotide primer having atag-(N)_(x)-TARGET nucleotide sequence, wherein the TARGET nucleotidesequence includes at least 7 TARGET nucleotide residues and wherein x isan integer from 0 to 9, wherein each N is any nucleotide residue, andwherein each tag is a polynucleotide having from 0 to about 20nucleotide residues.
 66. The method of claim 61, wherein the RCG is aPCR-generated RCG.
 67. The method of claim 61, wherein the RCG is anative RCG.
 68. The method of claim 61, wherein the RCG is prepared byperforming DOP-PCR using a degenerate oligonucleotide primer having atag-(N)_(x)-TARGET nucleotide sequence, wherein the TARGET nucleotidesequence includes less than 7 TARGET nucleotide residues and wherein xis an integer from 0 to 9, wherein each N is any nucleotide residue, andwherein each tag is a polynucleotide having from 0 to about 20nucleotide residues.
 69. A composition, comprising: a plurality of RCGsimmobilized in an ordered array on a surface.
 70. The composition ofclaim 69, wherein the RCGs prepared by the method of claim
 125. 71. Thecomposition of claim 69, wherein the RCGs are PCR-generated RCGs. 72.The composition of claim 69, wherein the RCGs are native RCGs.
 73. Akit, comprising: a container housing a set of polymerase chain reactionprimers for reducing the complexity of a genome, and a container housinga set of SNP-ASOs, wherein the SNPs are present with a frequency of atleast 50% in a RCG made using the set of primers.
 74. The kit of claim73, wherein the SNP-ASOs are attached to a surface.
 75. The kit of anyone of claims 73 or 74, wherein the set of polymerase chain reactionprimers are primers for DOP-PCR.
 76. The kit of claim 75, wherein thedegenerate oligonucleotide primer has a tag-(N)_(x)-TARGET nucleotidesequence, wherein the TARGET nucleotide sequence includes at least 7TARGET nucleotide residues and wherein x is an integer from 0 to 9,wherein each N is any nucleotide residue, and wherein each tag is apolynucleotide having from 0 to about 20 nucleotide residues.
 77. Thekit of claim 76, wherein the TARGET nucleotide sequence includes atleast 8 nucleotide residues.
 78. The kit of claim 76, wherein the TARGETnucleotide sequence includes at least 9 nucleotide residues.
 79. The kitof claim 76, wherein the TARGET nucleotide sequence includes at least 10nucleotide residues.
 80. The kit of claim 76, wherein the TARGETnucleotide sequence includes at least 11 nucleotide residues.
 81. Thekit of claim 76, wherein the TARGET nucleotide sequence includes 12nucleotide residues.
 82. The kit of any one of claims 73 or 74, whereinthe set of polymerase chain reaction primers are primers for ISR-PCR.83. The kit of any one of claims 73 or 74, wherein the set of polymerasechain reaction primers are primers for AP-PCR.
 84. The kit of any one ofclaims 73 or 74, wherein the set of polymerase chain reaction primersare primers for adapter-polymerase chain reaction.
 85. The kit of anyone of claims 73 or 74, wherein the SNP-ASOs are composed from 10 and 50nucleotide residues.
 86. The kit of any one of claims 73 or 74, whereinthe SNP-ASOs are composed of from 10 and 25 nucleotide residues.
 87. Thekit of any one of claims 73 or 74, wherein the SNP-ASOs are labeled witha fluorescent molecule.
 88. The kit of claim 75, wherein the degenerateoligonucleotide primer has a tag-(N)_(x)-TARGET nucleotide sequence,wherein the TARGET nucleotide sequence includes fewer than 7 TARGETnucleotide residues and wherein x is an integer from 0 to 9, whereineach N is any nucleotide residue, and wherein each tag is apolynucleotide having from 0 to about 20 nucleotide residues.
 89. Thekit of claim 73, wherein the set of polymerase chain reaction primersare primers for multiple-primed DOP-PCR.
 90. A composition comprising: aplurality of RCGs immobilized on a surface, wherein the RCGs arecomposed of a plurality of DNA fragments, each DNA fragment comprising a(N)_(x)-TARGET nucleotide portion, wherein the nucleotide sequence ofTARGET is identical in each of the DNA fragments, wherein TARGET is apolynucleotide consisting of at least 7 nucleotide residues, wherein xis an integer from 0 to 9, and wherein N is any nucleotide residue. 91.The composition of claim 90, wherein the TARGET nucleotide sequenceincludes 8 nucleotide residues.
 92. The composition of claim 90, whereinthe TARGET nucleotide sequence includes 9 nucleotide residues.
 93. Thecomposition of claim 90, wherein the TARGET nucleotide sequence includes10 nucleotide residues.
 94. The composition of claim 90, wherein theTARGET nucleotide sequence includes 11 nucleotide residues.
 95. Thecomposition of claim 90, wherein the TARGET nucleotide sequence includes12 nucleotide residues.
 96. The composition of any one of claims 90-95,wherein x is from 3 to
 9. 97. The composition of any one of 90-95,wherein x is
 6. 98. The composition of any one of 90-95, wherein x is 7.99. The composition of any one of 90-95, wherein x is
 8. 100. Thecomposition of any one of 90-95, wherein x is
 9. 101. A method foridentifying a SNP, the method comprising: preparing a set of primersfrom a RCG, wherein the RCG comprises a set of polymerase chain reaction(PCR) products, performing PCR using the set of primers on at least oneof isolated genome to produce a set of DNA products, and identifying aSNP on the set of DNA products.
 102. The method of claim 101, whereinthe plurality of isolated genomes is a pool of genomes.
 103. The methodof claim 101, wherein the isolated genomes are RCGs.
 104. The method ofclaim 103, wherein the RCG is prepared by DOP-PCR.
 105. The method ofclaim 101, wherein the step of preparing the set of primers is performedby at least the following steps: preparing a RCG and separating the setof PCR products in the RCG into individual PCR products, determining thesequence of each end of at least one of the PCR products, and generatingprimers for use in the subsequent PCR step based on the sequence of theends of the inserts.
 106. The method of claim 105, wherein the set ofPCR products are separated by gel electrophoresis.
 107. The method ofclaim 106, further comprising the step of preparing libraries fromsegments of the gel containing several PCR products and isolating clonesfrom the library, each clone including a PCR product containing plasmidfrom the library.
 108. The method of claim 105, wherein the set of PCRproducts are separated by high pressure liquid chromatography.
 109. Themethod of claim 105, wherein the set of PCR products are separated bycolumn chromatography.
 110. The method of claim 101, wherein the RCG isprepared by performing DOP-PCR using a degenerate oligonucleotide primerhaving a tag-(N)_(x)-TARGET nucleotide sequence, wherein the TARGETnucleotide sequence includes at least 7 TARGET nucleotide residues andwherein x is an integer from 0 to 9, wherein each N is any nucleotideresidue, and wherein each tag is a polynucleotide having from 0 to about20 nucleotide residues.
 111. The method of claim 110, wherein the TARGETnucleotide sequence includes 8 nucleotide residues.
 112. The method ofclaim 110, wherein the TARGET nucleotide sequence includes 9 nucleotideresidues.
 113. The method of claim 110, wherein the TARGET nucleotidesequence includes 10 nucleotide residues.
 114. The method of claim 110,wherein the TARGET nucleotide sequence includes 11 nucleotide residues.115. The method of claim 110, wherein the TARGET nucleotide sequenceincludes 12 nucleotide residues.
 116. The method of claim 101, whereinthe RCG is prepared by IRS-PCR.
 117. The method of claim 101, whereinthe RCG is prepared by AP-PCR.
 118. The method of claim 101, wherein theRCG is prepared by adapter-polymerase chain reaction.
 119. The method ofclaim 101, wherein the RCG is prepared by performing DOP-PCR using adegenerate oligonucleotide primer having a tag-(N)_(x)-TARGET nucleotidesequence, wherein the TARGET nucleotide sequence includes less than 7TARGET nucleotide residues and wherein x is an integer from 0 to 9,wherein each N is any nucleotide residue, and wherein each tag is apolynucleotide having from 0 to about 20 nucleotide residues.
 120. Themethod of claim 101, wherein x is greater than one.
 121. The method ofclaim 101, wherein the first and second steps of PCR products aregenerated using the same primers.
 122. A composition comprising: a panelof SNP-ASOs immobilized on a surface, wherein the SNP-ASOs are preparedby the method of claim
 101. 123. The composition of claim 122, whereineach SNP-ASO is immobilized in a discrete area of the surface.
 124. Thecomposition of claim 122, further comprising a panel of complementarySNP-ASOs immobilized on discrete areas of the surface.
 125. A method forobtaining a RCG using DOP-PCR, the method comprising: performing DOP-PCRusing a degenerate oligonucleotide primer having a tag-(N)_(x)-TARGETnucleotide sequence, wherein the TARGET nucleotide sequence includes atleast 7 TARGET nucleotide residues and wherein x is an integer from 0 to9, wherein each N is any nucleotide residue, and wherein each tag is apolynucleotide having from 0 to about 20 nucleotide residues.
 126. Themethod of claim 125, wherein the TARGET nucleotide sequence includes 8nucleotide residues.
 127. The method of claim 125, wherein the TARGETnucleotide sequence includes 9 nucleotide residues.
 128. The method ofclaim 125, wherein the TARGET nucleotide sequence includes 10 nucleotideresidues.
 129. The method of claim 125, wherein the TARGET nucleotidesequence includes 11 nucleotide residues.
 130. The method of claim 125,wherein the TARGET nucleotide sequence includes 12 nucleotide residues.131. The method of any one of 125-130, wherein x is from 3 to
 9. 132.The method of any one of 125-130, wherein x is
 6. 133. The method of anyone of 125-130, wherein x is
 7. 134. The method of any one of 125-130,wherein x is
 8. 135. The method of any one of 125-130, wherein x is 9.136. The method of claim 125, wherein the tag includes 6 nucleotideresidues.
 137. The method of any one of 125-136, further comprisingusing the RCG in a genotyping procedure.
 138. The method of any one of125-136, further comprising analyzing the RCG to detect a polymorphism.139. The method of claim 138 wherein the RCG is analyzed using massspectroscopy.
 140. A method for assessing whether a subject is at riskfor developing a disease, the method comprising: preparing a RCG from agenomic sample obtained from the subject and characterizing the sampleby the method of claim 1, whether one sample based on the presence orabsence in the sample of a plurality of SNP alleles that occur in atleast 10% of genomes obtained from individuals afflicted with thedisease occur in the reduced subject complexity genome.
 141. A methodfor identifying a set of SNP alleles associated with a disease, themethod comprising: preparing individual RCGs obtained from subjectsafflicted with a disease using the same set of primers to prepare eachRCG, and comparing individual genetic loci in the RCGs with the sameindividual genetic loci in normal subjects to identify SNP associatedwith the disease.
 142. A digital information product for representinggenomic information, the product comprising: a computer-readable mediumhaving computer-readable signals stored thereon, wherein the signalsdefine a data structure, the data structure including one or more datacomponents, wherein each data component includes: a first data elementdefining a genomic classification code that identifies a correspondinggenome, and wherein each genomic classification code classifies thecorresponding genome based one or more single nucleotide polymorphismsof the corresponding genome.
 143. The difital information proiduc ofclaim 142, wherein the genomic classification code is a uniqueidentifier of the corresponding genome.
 144. The digital informationproduct of claim 142, wherein the genomic classification code is basedon a pattern of the single nucleotide polymorphisms of the correspondinggenome, the pattern indicating the presence or absence of each singlenucleotide polymorphism.
 145. The digital information product of claim142, wherein each data component also includes: one or more dataelements, each data element defining an attribute of the correspondinggenome.
 146. A process for making a digital information productcomprising computer data signals defining a genomic classification codefor a genome, the process comprising: preparing a reduced complexitygenome, performing a hybridization reaction with the reduced complexitygenome and at least one surface having a panel of single nucleotidepolymorphism oligonucleotides immobilized thereon, identifying a genomicpattern of single nucleotide poymorphisms for the reduced complexitygenome by determining the presence therein of single nucleotidepolymorphisms based on whether each single nucleotide polymorphismoligonucleotide hybridizes to the reduced complexity genome, generatinga genomic classification code for the reduced complexity genome based onthe genomic pattern of the single nucleotide polymorphisms, and encodingthe genomic classification code as one or more computer data signals ona computer-readable medium.
 147. A process for making a digitalinformation product comprising computer data signals defining a genomicclassification code for a genome, the process comprising: preparing areduced complexity genome, performing a hybridization reaction with apanel of single nucleotide polymorphism oligonucleotides of knownallelic frequency and a surface having the reduced complexity genomeimmobilized thereon, identifying a genomic pattern of single nucleotidepolymorphisms for the reduced complexity genome by determining thepresence therein of single nucleotide polymorphisms based on whethereach single nucleotide polymorphism oligonucleotide hybridizes to thereduced complexity genome, generating a genomic classification code forthe reduced complexity genome based on the pattern and the allelicfrequency of the single nucleotide polymorphisms, and encoding thegenomic classification code as one or more computer data signals on acomputer-readable medium.
 148. A method for performing linkage analysis,comprising: preparing individual RCGs obtained from members of one ormore families, determining the presence or absence of SNP alleles in theRCGs, and comparing the RCGs of the family members by comparing thepresence or absence of the SNP alleles in the RCGs of the familymembers.